Content uploaded by Aishwarya Pawar

Author content

All content in this area was uploaded by Aishwarya Pawar on Jul 05, 2022

Content may be subject to copyright.

Seasonal and Nonseasonal GARCH Time Series Analysis:

Case Study of Bitcoin Historical and S&P 500 stock

datasets

Aishwarya Pawar

Department of Mathematical Sciences, Stevens Institute of Technology, Hoboken, NJ

Project Supervisor: Dr. Hadi Safari Katesari

Abstract

Bitcoin is the longest-running and most well-known cryptocurrency, and the Standard and Poor's 500, is a

stock market index tracking the stock performance of 500 large companies listed on exchanges in the United

States. The goal of this project is to provide a realistic forecast based on historically available data to predict

the future values of these financial assets. To achieve this goal, the following two dataset’s trends, seasonality,

and volatility was analyzed and a SARIMA was fitted to Bitcoin Dataset and univariate GARCH model was fitted

to the S&P 500 data. The residual analysis suggested the models were apt at forecasting the future values.

PART A: SEASONAL DATASET: Bitcoin Historical Data

Introduction and Motivation

Bitcoin is the longest-running and most well-known cryptocurrency, first released as open-source in 2009 by

the anonymous Satoshi Nakamoto. Bitcoin serves as a decentralized medium of digital exchange, with

transactions verified and recorded in a public distributed ledger (the blockchain) without the need for a trusted

record-keeping authority or central intermediary. Transaction blocks contain an SHA-256 cryptographic hash

of previous transaction blocks and are thus "chained" together, serving as an immutable record of all

transactions that have ever occurred. As with any currency/commodity on the market, bitcoin trading and

financial instruments soon followed the public adoption of bitcoin and continue to grow.

The goal of this project is to provide a realistic forecast based on historically available data to predict the future

values of this cryptocurrency. To achieve this goal, we will develop a model to learn from the trends and

seasonality present in the data to adapt accordingly and provide accurate future predictions.

Data Description

2

Date Range: From Jan 2012 to March 2021

Datasource Description: The data is from Kaggle website and can be accessed using the link:

https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data

Dataset Description: The dataset contains 4,857,377 entries, 8 total columns. The entries are of minute-to-

minute updates of OHLC (Open, High, Low, Close), Volume in BTC and indicated currency, and weighted bitcoin

price. Timestamps are in Unix time. Timestamps without any trades or activity have their data fields filled with

NaNs.

Dataset Entries:

Timestamp: Start time of time window (60s window), in Unix time

Open: Open price at start time window

High: High price within time window

Low: Low price within time window

Close: Close price at end of time window

Volume_(BTC): Volume of BTC transacted in this window

Volume_(Currency): Volume of corresponding currency transacted in this window

Weighted_Price: VWAP- Volume Weighted Average Price

Initial Look at the Data

The dataset is quite a large one and there are a lot of null values in the dataset. Also, since the observations

are collected on a per-minute basis we will need to do some pre-processing to get data on a monthly basis.

Also, let's have a look at the plots as we will be fitting a model for Weighted Price prediction.

head(Data)

## Timestamp Open High Low Close Volume_.BTC. Volume_.Currency. Weighted_Price

## 1 1325317920 4.39 4.39 4.39 4.39 0.4555809 2 4.39

## 2 1325317980 NaN NaN NaN NaN NaN NaN NaN

## 3 1325318040 NaN NaN NaN NaN NaN NaN NaN

## 4 1325318100 NaN NaN NaN NaN NaN NaN NaN

## 5 1325318160 NaN NaN NaN NaN NaN NaN NaN

## 6 1325318220 NaN NaN NaN NaN NaN NaN NaN

3

Since there is a large amount of data, the patterns are quite clustered. They're either quite low or quite high,

so by taking a log transform of the data we can have a better look at the data.

Data Pre-processing

Before any further preprocessing we want to remove the null values.

# REMOVING THE NULL VALUES

df<- na.omit(Data)

4

Since the data is collected on a minute-to-minute basis for over 9 years, we have around half a million data

entries. We want to convert them to a monthly basis. To do so we are going to use the pandas function (from

python) resample function, to convert the data to a monthly one, by taking the mean of daily samples.

pd <- import("pandas")

#CONVERT TO PANDAS DF

df <- r_to_py(df)

df$index = df$Timestamp

df = df$resample('D')$mean()

df_month = df$resample('M')$mean()

df_year = df$resample('A-DEC')$mean()

df_Q = df$resample('Q-DEC')$mean()

# CONVERT BACK TO R

df <- py_to_r(df)

df_month <- py_to_r(df_month)

df_year <- py_to_r(df_year)

df_Q <- py_to_r(df_Q)

Plotting the resampled weighted price on a monthly, quarterly and yearly basis.

For the purpose of this project, we are going to use monthly Bitcoin prices to fit the model and forecast it.

DECOMPOSING THE TIME SERIES:

Decomposing the time series to have a look at the seasonal components, trend components, and residuals in

it.

ts_bitcoin = ts(log(df_month$Weighted_Price), frequency = 12)

decompose_bitcoin = decompose(ts_bitcoin, "multiplicative")

plot(decompose_bitcoin, type='l',lwd=2, col = 'red')

5

The series exhibits multiplicative decomposition. As the amplitude of both the seasonal and irregular

variations increase as the level of the trend rises. In the multiplicative model, the original time series is

expressed as the product of trend, seasonal and irregular components.

FITTING THE TIME SERIES MODEL

Now that we have performed exploratory data analysis for the project, we can now fit a time series model for

the data. First, we need to perform some checks to make sure that the data is in the correct format.

Stationarity check.

Let's start by checking if the time series is stationary or not. To do so we are going to use the Dickey Fuller

and/or augmented Dickey fuller test.

adf.test(df_month$Weighted_Price)

## Augmented Dickey-Fuller Test

## data: df_month$Weighted_Price

## Dickey-Fuller = 0.98776, Lag order = 4, p-value = 0.99

## alternative hypothesis: stationary

Shown by the augmented Dickey fuller test, the P values is not less than .05. So, we have to reject the null

hypothesis and the series is not stationary. That is, the time series has some time dependence structure and

the variance is not constant overtime.

It is important to make the series stationary before fitting a model to it, because, we observe a single run of a

stochastic process rather than repeated runs of the stochastic process. Hence, we need stationarity and

ergodicity so that observing a long run of a stochastic process is similar to observing many independent runs

of a stochastic process.

Plotting the ACF and PCF of the series.

6

Autocorrelation plot of the data decays very slowly. This is in line with the Dickey fuller test result, that the

Time series is not stationary. We can notice a high auto correlation evident from the trend present in the Data.

MAKING THE TIME SERIES STATIONARY

Using the Box Cox transformation:

The Box-Cox transformation is a family of power transformations indexed by a parameter lambda. This

transformation can be used whenever we have a non-stationary time series (non-constant variance). After

applying Box-Cox with a particular value of lambda the process may become stationary.

7

b <- boxcox(lm(df_month$Weighted_Price ~ 1))

lambda <- b$x[which.max(b$y)]

lambda

## [1] 0.1010101

new_df_month_exact <- (df_month$Weighted_Price ^ lambda - 1) / lambda

ACF and PACF plots of the Box Cox transformed series:

adf.test(new_df_month_exact)

## Augmented Dickey-Fuller Test

## data: new_df_month_exact

## Dickey-Fuller = -2.4263, Lag order = 4, p-value = 0.3997

## alternative hypothesis: stationary

8

We see that even after the Box Cox transformation, the series is still not stationary. The P value is greater than

0.5, indicating that the series is not stationary. Also, we can see from the acf and pacf plots, that there is still

autocorrelation present in the time series. To get rid of this correlation and make the series stationary, let's

take a seasonal differencing of the time series.

SEASONAL DIFFERENCING

From the ACF and PACF plots, we can see that the correlations have decreased significantly. The sample ACF

looks like a damped sine wave, but it does not die out like the theoretical ACF.

9

From the augmented Dickey Fuller test, we don't get a statistically significant P value. Hence the series is still

not stationary.

adf.test(diff(new_df_month_exact,lag = 12))

Augmented Dickey-Fuller Test

data: diff(new_df_month_exact, lag = 12)

Dickey-Fuller = -2.1158, Lag order = 4, p-value = 0.5287

alternative hypothesis: stationary

We can also see correlations in the ACF plot. To get rid of this correlation, we will do a regular differencing of

the series.

ADDITIONAL REGULAR DIFFERENCING

ACF and PACF of the series after another differencing.

10

## Augmented Dickey-Fuller Test

## data: diff(diff(new_df_month_exact,lag=12))

## Dickey-Fuller = -4.3998, Lag order = 4, p-value = 0.01

## alternative hypothesis: stationary

After the regular differencing the series we see that the P value obtained from the augmented Dickey fuller

test is statistically significant i.e., is less than 0.05. Hence, we can now conclude that the series is stationary.

Also, from the ACF and PCF lots we can see that the correlations have significantly decreased.

Now that we have a stationary series, we'll have a look at the EACF plots to finalize the order of our AR, MA

models and then fit the model.

eacf(diff(diff(new_df_month_exact,lag = 12)))

## AR/MA

## 0 1 2 3 4 5 6 7 8 9 10 11 12 13

## 0 x x o o o o o o o o x x x o

## 1 o o o o o o x o o o o x o o

## 2 x o o o o o x o o o o x o o

## 3 x x o o o o o o o o o x o o

## 4 x o o o o o o o o o o x x o

## 5 x x o x o o o o o o o x o o

## 6 x x o x o o o o o o o x o o

## 7 x o x x o o o o o o o x o o

DETERMINING THE ORDER OF THE MODEL

From the ACF, PACF and EACF we see that the best model to fit the data can be in the range of 0 to 2 for p, q

P and Q. We will select the best model based on the lowest value of AIC for values for P, Q, p and q.

Below I have written a nested for loop, which loops over every combination of possible value for P, Q, p and q

in the range of 0 to 2 and will fit a SARIMA model for all of them. Then the AICs of all these models with their

corresponding P, Q, p and q value are stored in a list. This list is then sorted in an ascending order, and then

the top value of the list is popped. This value represents the lowest AIC model along with the corresponding

P, Q, p and q value.

11

NESTED FOR LOOP FOR DERTMING THE VALUES OF P, Q, p and q

Qs <- seq(from = 0, to = 1, by = 1)

qs <- seq(from = 0, to = 2, by = 1)

Ps <- seq(from = 0, to = 2, by = 1)

ps <- seq(from = 0, to = 2, by = 1)

D=1

d=1

parameters <- expand.grid(ps, qs, Ps, Qs)

parameters_list = as.list(parameters)

results <- vector(mode = 'list')

p_val <- vector(mode = 'list')

q_val <- vector(mode = 'list')

Q_val <- vector(mode = 'list')

P_val <- vector(mode = 'list')

for(p in 0:2)

{for(q in 0:2)

{for(P in 0:2)

{for(Q in 0:1)

{if(p+d+q+P+D+Q<=10)

{model<-

arima(x=df_month$predSeries,order=c(p,d,q),seaonal=list(order=c(P,D,Q),period=12)

test<-Box.test(model$residuals,lag=log(length(model$residuals)))

sse=sum(model$residuals^2)

results<- append(results, model$aic)

p_val<- append(p_val, p)

q_val<- append(q_val, q)

Q_val<- append(Q_val, Q)

P_val<- append(P_val, P)

cat(p,d,q,P,D,Q, "AIC:",model$aic,"SSE:",sse, ‘p-value: ’,

test$p.value,’\n’)

}

}

}

}

}

final_result <- data.frame(matrix(unlist(results), nrow=length(results), byrow=TRUE))

final_result$p <- p_val

final_result$q <- q_val

final_result$P <- P_val

final_result$Q <- Q_val

colnames(final_result)[1] <- "AIC"

final_result <- final_result[order(final_result$AIC),]

final_result[1,]

## AIC p q P Q

## 52 -250.1339 1 0 2 0

FITTING A SARIMA MODEL AND FORECASTING IT

12

df_month$predSeries <- new_df_month_exact

pdqParam <- c(1,1,0)

PDQParam <- c(2,1,0)

manualFit <- arima(df_month$predSeries, pdqParam, seasonal = list(order = PDQParam, period =

12))

manualPred <- predict(manualFit, n.ahead = 25)

ts.plot(as.ts(df_month$predSeries), (manualPred$pred), lty = c(1,3),col="red",lwd=3)

Plotting the time series of the prediction with respect to the original data.

RESIDUAL ANALYSIS

Plotting the autocorrelations of the residuals of the Forecasted series. We can see that they don’t exhibit any

correlation amongst themselves and just one significant lag at 7

13

acf(residuals(manualFit))

ts.plot(residuals(manualFit),lwd=3,col="red",main='Residual Analysis')

ggqqplot(residuals(manualFit))

14

## Box-Ljung test

## data: residuals(manualFit)

## X-squared = 0.0035786, df = 1, p-value = 0.9523

As the p value is greater than 0.05 then the residuals are independent which we want for the model to be

correct.

LBQPlot(residuals(manualFit), lag.max = 30, SquaredQ = FALSE)

CONCLUSION:

15

As we can see from the residual analysis that the SARIMA (1, 1, 0) x (2, 1, 0, 12) is a good fit to the data.

However, as we know Bitcoin price is highly volatile and there are many more factors that can be considered

whole fitting the model to get more precise and accurate forecasts.

PART B: NON-SEASONAL DATASET: S&P 500 stock data

Introduction and Motivation

Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large

financial payoff. The amount of financial data on the web is seemingly endless. A large and well-structured dataset

on a wide array of companies can be hard to come by. Here I am using a dataset with historical stock prices (last 5

years) for all companies currently found on the S&P 500 index.

The goal of this project is to predict and forecast daily return values for a particular stock. Given the highly

volatile nature of stock data, we will fit a univariate GARCH model to achieve our goal of predicting the daily

returns value to allow you to make statistically informed trades!

Data Description

Date Range: From Feb 2013 to Feb 2018

Dataset Description: The dataset contains 5 years of stock data for all companies currently found on the S&P 500

index.

Datasource Description: The data is from Kaggle website and can be accessed using the link:

https://www.kaggle.com/datasets/camnugent/sandp500

Date - in format: yy-mm-dd

Open - price of the stock at market open (this is NYSE data so all in USD)

High - Highest price reached in the day

Low Close - Lowest price reached in the day

Volume - Number of shares traded

Name - the stock's ticker name

Initial look at the data:

Since the data set contains over 500 companies stock prices, we will just select one company’s stock price data for

the scope of this project. We will be using Apple’s Stock price data. Plotting the initial data.

df$date <- as_datetime(df$date)

df_ts <- xts(as.numeric(df$close),df$date)

chartSeries(df_ts)

As mentioned before, we will be predicting the daily returns of the stock. Daily return on a stock is used to

measure the day-to-day performance of stocks, it is the price of stocks at today's closure compared to the

price of the same stock at yesterday's closure.

16

The code below calculates the daily returns and also plots it for us.

return <- CalculateReturns(df_ts)

return <-return[-1]

chartSeries(return)

Histogram of the returns data

17

As we can see from the above returns plot, it is stationary and exhibits normality (confirmed from the plot

above). This is also confirmed by the augmented Dickey Fuller test where the P value is statistically significant

so that we can conclude that the data is indeed stationary. Also, we don’t see any trends in the data. The

important point here is that in the returns plot shows, there is a lot of volatility present. For example, we can

see periods of high volatility on Feb 03, 2014, and periods of low volatility around Feb 01, 2017. So, to capture

this volatility, we will be fitting a univariate Grach model.

Plotting the stocks rolling monthly and annual volatility.

18

DETERMINING THE ORDER OF THE MODEL

As we know that the ARMA models are used to capture the constant mean in the series, and the constant

variance will be captured by the GARCH model. We can see from the return plots that there is a constant

mean. We will verify this by using the nested for loop used in PART A of this project. Then we will determine

the order of the GARCH model by using the ACF and PACF plots.

As we are using the ugrachfit method from the “rugarch” library instead of the garch method from the

“tseries” library. We directly give the ARMA and GRACH order to the method ugrachfit and the return data

instead of giving the residuals as we would have for the garch method. This implementation can be verified

from references numbered [3,4,5].

Having a look at the ACF, PACF and EACF of the series to determine the order of the GRACH model. As we can

see from the ACF and PACF plots there are not significant lags, there is one at lag 7 but for the scope of this

project we will hence stick to GARCH (1,1) model.

19

Also, from our nested loop it is now confirmed that ARMA of order (0,0) gives the lowest AIC for this series.

final_result[1,]

## AIC p q

## 1 -7064.618 0 0

FINDING THE BEST GRACH MODEL

MODEL 1: Normal distribution GARCH MODEL (sGARCH is the standard model)

s <- ugarchspec(mean.model = list(armaOrder = c(0,0)),variance.model = list(model =

"sGARCH"),

distribution.model = 'norm')

m <- ugarchfit(data = return, spec = s, solver ='hybrid')

Let's have a look at some of the coefficients of this model.

## Conditional Variance Dynamics

## -----------------------------------

## GARCH Model : sGARCH(1,1)

## Mean Model : ARFIMA(0,0,0)

## Distribution : norm

##

## Optimal Parameters

## ------------------------------------

## Estimate Std. Error t value Pr(>|t|)

## mu 0.000957 0.000404 2.37090 0.017745

## omega 0.000000 0.000000 0.96945 0.332320

## alpha1 0.008655 0.000733 11.81086 0.000000

## beta1 0.989002 0.000592 1670.61474 0.000000

##

## Robust Standard Errors:

## Estimate Std. Error t value Pr(>|t|)

## mu 0.000957 0.000418 2.288850 0.022088

## omega 0.000000 0.000005 0.090515 0.927878

20

## alpha1 0.008655 0.001191 7.267157 0.000000

## beta1 0.989002 0.001602 617.314884 0.000000

##

## LogLikelihood : 3541.387

##

## Information Criteria

## ------------------------------------

##

## Akaike -5.6238

## Bayes -5.6075

## Shibata -5.6238

## Hannan-Quinn -5.6177

##

## Weighted Ljung-Box Test on Standardized Residuals

## ------------------------------------

## statistic p-value

## Lag[1] 1.062 0.3028

## Lag[2*(p+q)+(p+q)-1][2] 1.100 0.4669

## Lag[4*(p+q)+(p+q)-1][5] 1.502 0.7395

## d.o.f=0

## H0 : No serial correlation

##

## Weighted Ljung-Box Test on Standardized Squared Residuals

## ------------------------------------

## statistic p-value

## Lag[1] 3.501 0.06135

## Lag[2*(p+q)+(p+q)-1][5] 4.994 0.15359

## Lag[4*(p+q)+(p+q)-1][9] 6.495 0.24524

## d.o.f=2

##

## Weighted ARCH LM Tests

## ------------------------------------

## Statistic Shape Scale P-Value

## ARCH Lag[3] 1.246 0.500 2.000 0.2643

## ARCH Lag[5] 1.526 1.440 1.667 0.5855

## ARCH Lag[7] 2.124 2.315 1.543 0.6910

##

## Nyblom stability test

## ------------------------------------

## Joint Statistic: 305.5824

## Individual Statistics:

## mu 0.05258

## omega 27.21255

## alpha1 0.11354

## beta1 0.10269

##

## Asymptotic Critical Values (10% 5% 1%)

## Joint Statistic: 1.07 1.24 1.6

## Individual Statistic: 0.35 0.47 0.75

##

## Sign Bias Test

## ------------------------------------

## t-value prob sig

## Sign Bias 0.2304 0.817814

## Negative Sign Bias 2.7854 0.005427 ***

## Positive Sign Bias 0.7584 0.448333

## Joint Effect 11.2500 0.010448 **

##

21

## Adjusted Pearson Goodness-of-Fit Test:

## ------------------------------------

## group statistic p-value(g-1)

## 1 20 86.90 1.171e-10

## 2 30 96.44 3.608e-09

## 3 40 107.91 2.204e-08

## 4 50 108.93 1.906e-06

## Elapsed time : 0.152513

In general, lesser the information criteria better are the model.

As we can see from Ljung-Box Test on Standardized Squared Residuals since all values are greater than 0.05,

we cannot reject H0 i.e., the residuals more or less behave like a white noise process. (Highlighted in GREEN)

As Adjusted Pearson Goodness-of-Fit Test: Since all the values are less than 0.05, we reject the null hypothesis

and conclude that the model for residuals we used, i.e., the normal distribution is not a good choice. As for

good models we need p values above 0.05 for this test (Highlighted in YELLOW)

Also, from the QQ plot we can see that it's not a good fit as the tail ends do not fit well.

MODEL 2: To fit a better model, we use the skewed student t distribution GARCH MODEL

s <- ugarchspec(mean.model = list(armaOrder = c(0,0)),variance.model = list(model =

"sGARCH"),

distribution.model = 'sstd')

m <- ugarchfit(data = return, spec = s)

## Conditional Variance Dynamics

## -----------------------------------

## GARCH Model : sGARCH(1,1)

## Mean Model : ARFIMA(0,0,0)

22

## Distribution : sstd

##

## Optimal Parameters

## ------------------------------------

## Estimate Std. Error t value Pr(>|t|)

## mu 0.001001 0.000380 2.63211 0.008486

## omega 0.000002 0.000003 0.56134 0.574567

## alpha1 0.034789 0.014457 2.40638 0.016111

## beta1 0.958942 0.013409 71.51432 0.000000

## skew 1.019679 0.038499 26.48559 0.000000

## shape 3.891047 0.467266 8.32727 0.000000

##

## Robust Standard Errors:

## Estimate Std. Error t value Pr(>|t|)

## mu 0.001001 0.000402 2.490013 0.012774

## omega 0.000002 0.000025 0.077488 0.938235

## alpha1 0.034789 0.114872 0.302847 0.762006

## beta1 0.958942 0.105073 9.126445 0.000000

## skew 1.019679 0.041230 24.731646 0.000000

## shape 3.891047 4.561011 0.853111 0.393598

##

## LogLikelihood : 3639.673

##

## Information Criteria

## ------------------------------------

##

## Akaike -5.7769

## Bayes -5.7524

## Shibata -5.7770

## Hannan-Quinn -5.7677

##

## Weighted Ljung-Box Test on Standardized Residuals

## ------------------------------------

## statistic p-value

## Lag[1] 1.304 0.2534

## Lag[2*(p+q)+(p+q)-1][2] 1.312 0.4073

## Lag[4*(p+q)+(p+q)-1][5] 1.810 0.6641

## d.o.f=0

## H0 : No serial correlation

##

## Weighted Ljung-Box Test on Standardized Squared Residuals

## ------------------------------------

## statistic p-value

## Lag[1] 0.3401 0.5598

## Lag[2*(p+q)+(p+q)-1][5] 0.5667 0.9469

## Lag[4*(p+q)+(p+q)-1][9] 0.9208 0.9893

## d.o.f=2

##

## Weighted ARCH LM Tests

## ------------------------------------

## Statistic Shape Scale P-Value

## ARCH Lag[3] 0.0005719 0.500 2.000 0.9809

## ARCH Lag[5] 0.4242667 1.440 1.667 0.9057

## ARCH Lag[7] 0.7116805 2.315 1.543 0.9554

##

## Nyblom stability test

## ------------------------------------

## Joint Statistic: 80.219

23

## Individual Statistics:

## mu 0.04585

## omega 3.86477

## alpha1 0.48638

## beta1 0.56370

## skew 0.02211

## shape 1.13626

##

## Asymptotic Critical Values (10% 5% 1%)

## Joint Statistic: 1.49 1.68 2.12

## Individual Statistic: 0.35 0.47 0.75

##

## Sign Bias Test

## ------------------------------------

## t-value prob sig

## Sign Bias 0.7161 0.4741

## Negative Sign Bias 1.4449 0.1487

## Positive Sign Bias 0.1492 0.8814

## Joint Effect 5.7662 0.1236

##

##

## Adjusted Pearson Goodness-of-Fit Test:

## ------------------------------------

## group statistic p-value(g-1)

## 1 20 23.59 0.2124

## 2 30 25.90 0.6311

## 3 40 39.23 0.4594

## 4 50 54.24 0.2815

##

The Adjusted Pearson Goodness-of-Fit Test: is above 0.05 so it is definitely a better model. Skew values are

1.019679 i.e., approx. 1 so we have asymmetric distribution. The QQ plot shows a way better fit to the data.

The ACF of the residuals don’t have any significant lags and don’t exhibit any correlation.

24

It is a good fit to the data, except the impact of news for predicting the stock process is not considered (As

shown in the plot below)

So, let's fit a GJR GRACH model where the effect of news is taken into consideration.

MODEL 3: To include the news impact on forecasting, let's use the GJR GARCH MODEL

GJR GARCH MODEL ARMA(0,0)

s <- ugarchspec(mean.model = list(armaOrder = c(0,0)),

variance.model = list(model = "gjrGARCH"),

25

distribution.model = 'sstd')

m <- ugarchfit(data = return, spec = s)

## Conditional Variance Dynamics

## -----------------------------------

## GARCH Model : gjrGARCH(1,1)

## Mean Model : ARFIMA(0,0,0)

## Distribution : sstd

##

## Optimal Parameters

## ------------------------------------

## Estimate Std. Error t value Pr(>|t|)

## mu 0.000944 0.000368 2.56456 0.010331

## omega 0.000012 0.000000 55.77643 0.000000

## alpha1 0.002302 0.005202 0.44254 0.658098

## beta1 0.866296 0.015132 57.24930 0.000000

## gamma1 0.166615 0.035869 4.64509 0.000003

## skew 1.015644 0.039148 25.94355 0.000000

## shape 4.253347 0.499061 8.52270 0.000000

##

## Robust Standard Errors:

## Estimate Std. Error t value Pr(>|t|)

## mu 0.000944 0.000394 2.39332 0.016697

## omega 0.000012 0.000000 64.86986 0.000000

## alpha1 0.002302 0.005930 0.38823 0.697843

## beta1 0.866296 0.016674 51.95337 0.000000

## gamma1 0.166615 0.037996 4.38512 0.000012

## skew 1.015644 0.038725 26.22696 0.000000

## shape 4.253347 0.526336 8.08104 0.000000

##

## LogLikelihood : 3656.336

##

## Information Criteria

## ------------------------------------

##

## Akaike -5.8018

## Bayes -5.7732

## Shibata -5.8019

## Hannan-Quinn -5.7911

##

## Weighted Ljung-Box Test on Standardized Residuals

## ------------------------------------

## statistic p-value

## Lag[1] 1.220 0.2693

## Lag[2*(p+q)+(p+q)-1][2] 1.237 0.4272

## Lag[4*(p+q)+(p+q)-1][5] 1.802 0.6660

## d.o.f=0

## H0 : No serial correlation

##

## Weighted Ljung-Box Test on Standardized Squared Residuals

## ------------------------------------

## statistic p-value

## Lag[1] 0.05015 0.8228

## Lag[2*(p+q)+(p+q)-1][5] 0.70509 0.9219

## Lag[4*(p+q)+(p+q)-1][9] 1.67689 0.9399

## d.o.f=2

##

26

## Weighted ARCH LM Tests

## ------------------------------------

## Statistic Shape Scale P-Value

## ARCH Lag[3] 0.03601 0.500 2.000 0.8495

## ARCH Lag[5] 1.16336 1.440 1.667 0.6851

## ARCH Lag[7] 1.88759 2.315 1.543 0.7411

##

## Nyblom stability test

## ------------------------------------

## Joint Statistic: 38.6327

## Individual Statistics:

## mu 0.10511

## omega 12.83641

## alpha1 0.49054

## beta1 0.98448

## gamma1 0.36038

## skew 0.02083

## shape 1.34598

##

## Asymptotic Critical Values (10% 5% 1%)

## Joint Statistic: 1.69 1.9 2.35

## Individual Statistic: 0.35 0.47 0.75

##

## Sign Bias Test

## ------------------------------------

## t-value prob sig

## Sign Bias 0.9288 0.3532

## Negative Sign Bias 0.1286 0.8977

## Positive Sign Bias 0.1831 0.8547

## Joint Effect 1.2585 0.7390

##

##

## Adjusted Pearson Goodness-of-Fit Test:

## ------------------------------------

## group statistic p-value(g-1)

## 1 20 20.28 0.3777

## 2 30 24.08 0.7248

## 3 40 36.24 0.5962

## 4 50 43.67 0.6884

27

28

The model fit is quite similar except the news now impacts our forecasts. We can see from the news impact

curve if there is any positive news the increase in the price of the model is gradual. So, if the effect of certain

news on the market if negative the impact is significantly high.

29

SIMULATION AND FORECASTING

We will be using the GJR GARCH model for the final forecasting

m <- ugarchfit(data = return, spec = s)

sfinal <- s

setfixed(sfinal) <- as.list(coef(m))

coef(m)

Often we want to use an estimated model to subsequently forecast the conditional variance. The function

used for this purpose is the ugarchforecast function. In ugarchforecast( ), the data is ignored and the forecast

is based on the fitted model and the last few data points in the training set.

f2018 <- ugarchforecast(data = return["/2018-12"], fitORspec = sfinal, n.ahead = 252)

plot(sigma(f2018)) will give us the fitted standard deviation (as seen in the plot below)

30

Since the volatility at the end of 2018 was high it is expected that in 2019 it will fall.

sim <- ugarchpath(spec = sfinal, m.sim = 2, n.sim = 1*252, rseed = 123)

plot.zoo(fitted(sim))

Plotting the variability

31

Looking at the closing values at end of 2018 and then using it to forecast. We will use the apply function to get

the fitting returns and convert them to cumulative values

tail(df_stocks$close)

## [1] 167.43 167.78 160.50 156.49 163.03 159.54

p <- 159.54*apply(fitted(sim), 2, 'cumsum') + 159.54

matplot(p, type = "l", lwd = 3)

CONCLUSION:

32

Thus, we have fitted a GRJ-GRACH (1,1) model to get the daily return price of the stocks. The final plot takes

the closing price of the stock at the end of 2018 and predicts it for the next entire year.

REFERNCES:

1. Cryer, J. D., & Chan, K. S. (2008). Time series analysis: with applications in R (Vol. 2). New York: Springer.

2. Katesari, H. S., & Vajargah, B. F. (2015). Testing adverse selection using frank copula approach in Iran

insurance markets. Mathematics and Computer Science, 15(2), 154-158.

3. Katesari, H. S., & Zarodi, S. (2016). Effects of coverage choice by predictive modeling on frequency of

accidents. Caspian Journal of Applied Sciences Research, 5(3), 28-33.

4. Safari-Katesari, H., Samadi, S. Y., & Zaroudi, S. (2020). Modelling count data via copulas. Statistics,

54(6), 1329-1355.

5. Shumway, R. H., Stoffer, D. S., & Stoffer, D. S. (2000). Time series analysis and its applications (Vol.

3). New York: springer.

6. Safari-Katesari, H., & Zaroudi, S. (2020). Count copula regression model using generalized beta

distribution of the second kind. Statistics, 21, 1-12.

7. Safari-Katesari, H., & Zaroudi, S. (2021). Analysing the impact of dependency on conditional survival

functions using copulas. Statistics in Transition New Series, 22(1).

8. Safari Katesari, H., (2021) Bayesian dynamic factor analysis and copula-based models for mixed data,

PhD dissertation, Southern Illinois University Carbondale

9. Tsay, R. S. (2013). Multivariate time series analysis: with R and financial applications. John Wiley &

Sons.

10. Zaroudi, S., Faridrohani, M. R., Behzadi, M. H., & Safari-Katesari, H. (2022). Copula-based Modeling

for IBNR Claim Loss Reserving. arXiv preprint arXiv:2203.12750.

LINKS:

1. https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data

2. https://www.kaggle.com/datasets/camnugent/sandp500

3. https://www.rdocumentation.org/packages/forecast/versions/8.16/topics/Arima

4. https://people.duke.edu/~rnau/arimrule.htm

5. https://www.rdocumentation.org/packages/rugarch/versions/1.4-8a

6. https://faculty.washington.edu/ezivot/econ589/univariateGarch2012powerpoint.pdf

7. https://www.financialriskforecasting.com/seminars/seminars/Seminar4.html#4