Content uploaded by Aishwarya Pawar
Author content
All content in this area was uploaded by Aishwarya Pawar on Jul 05, 2022
Content may be subject to copyright.
Seasonal and Nonseasonal GARCH Time Series Analysis:
Case Study of Bitcoin Historical and S&P 500 stock
datasets
Aishwarya Pawar
Department of Mathematical Sciences, Stevens Institute of Technology, Hoboken, NJ
Project Supervisor: Dr. Hadi Safari Katesari
Abstract
Bitcoin is the longest-running and most well-known cryptocurrency, and the Standard and Poor's 500, is a
stock market index tracking the stock performance of 500 large companies listed on exchanges in the United
States. The goal of this project is to provide a realistic forecast based on historically available data to predict
the future values of these financial assets. To achieve this goal, the following two dataset’s trends, seasonality,
and volatility was analyzed and a SARIMA was fitted to Bitcoin Dataset and univariate GARCH model was fitted
to the S&P 500 data. The residual analysis suggested the models were apt at forecasting the future values.
PART A: SEASONAL DATASET: Bitcoin Historical Data
Introduction and Motivation
Bitcoin is the longest-running and most well-known cryptocurrency, first released as open-source in 2009 by
the anonymous Satoshi Nakamoto. Bitcoin serves as a decentralized medium of digital exchange, with
transactions verified and recorded in a public distributed ledger (the blockchain) without the need for a trusted
record-keeping authority or central intermediary. Transaction blocks contain an SHA-256 cryptographic hash
of previous transaction blocks and are thus "chained" together, serving as an immutable record of all
transactions that have ever occurred. As with any currency/commodity on the market, bitcoin trading and
financial instruments soon followed the public adoption of bitcoin and continue to grow.
The goal of this project is to provide a realistic forecast based on historically available data to predict the future
values of this cryptocurrency. To achieve this goal, we will develop a model to learn from the trends and
seasonality present in the data to adapt accordingly and provide accurate future predictions.
Data Description
2
Date Range: From Jan 2012 to March 2021
Datasource Description: The data is from Kaggle website and can be accessed using the link:
https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data
Dataset Description: The dataset contains 4,857,377 entries, 8 total columns. The entries are of minute-to-
minute updates of OHLC (Open, High, Low, Close), Volume in BTC and indicated currency, and weighted bitcoin
price. Timestamps are in Unix time. Timestamps without any trades or activity have their data fields filled with
NaNs.
Dataset Entries:
Timestamp: Start time of time window (60s window), in Unix time
Open: Open price at start time window
High: High price within time window
Low: Low price within time window
Close: Close price at end of time window
Volume_(BTC): Volume of BTC transacted in this window
Volume_(Currency): Volume of corresponding currency transacted in this window
Weighted_Price: VWAP- Volume Weighted Average Price
Initial Look at the Data
The dataset is quite a large one and there are a lot of null values in the dataset. Also, since the observations
are collected on a per-minute basis we will need to do some pre-processing to get data on a monthly basis.
Also, let's have a look at the plots as we will be fitting a model for Weighted Price prediction.
head(Data)
## Timestamp Open High Low Close Volume_.BTC. Volume_.Currency. Weighted_Price
## 1 1325317920 4.39 4.39 4.39 4.39 0.4555809 2 4.39
## 2 1325317980 NaN NaN NaN NaN NaN NaN NaN
## 3 1325318040 NaN NaN NaN NaN NaN NaN NaN
## 4 1325318100 NaN NaN NaN NaN NaN NaN NaN
## 5 1325318160 NaN NaN NaN NaN NaN NaN NaN
## 6 1325318220 NaN NaN NaN NaN NaN NaN NaN
3
Since there is a large amount of data, the patterns are quite clustered. They're either quite low or quite high,
so by taking a log transform of the data we can have a better look at the data.
Data Pre-processing
Before any further preprocessing we want to remove the null values.
# REMOVING THE NULL VALUES
df<- na.omit(Data)
4
Since the data is collected on a minute-to-minute basis for over 9 years, we have around half a million data
entries. We want to convert them to a monthly basis. To do so we are going to use the pandas function (from
python) resample function, to convert the data to a monthly one, by taking the mean of daily samples.
pd <- import("pandas")
#CONVERT TO PANDAS DF
df <- r_to_py(df)
df$index = df$Timestamp
df = df$resample('D')$mean()
df_month = df$resample('M')$mean()
df_year = df$resample('A-DEC')$mean()
df_Q = df$resample('Q-DEC')$mean()
# CONVERT BACK TO R
df <- py_to_r(df)
df_month <- py_to_r(df_month)
df_year <- py_to_r(df_year)
df_Q <- py_to_r(df_Q)
Plotting the resampled weighted price on a monthly, quarterly and yearly basis.
For the purpose of this project, we are going to use monthly Bitcoin prices to fit the model and forecast it.
DECOMPOSING THE TIME SERIES:
Decomposing the time series to have a look at the seasonal components, trend components, and residuals in
it.
ts_bitcoin = ts(log(df_month$Weighted_Price), frequency = 12)
decompose_bitcoin = decompose(ts_bitcoin, "multiplicative")
plot(decompose_bitcoin, type='l',lwd=2, col = 'red')
5
The series exhibits multiplicative decomposition. As the amplitude of both the seasonal and irregular
variations increase as the level of the trend rises. In the multiplicative model, the original time series is
expressed as the product of trend, seasonal and irregular components.
FITTING THE TIME SERIES MODEL
Now that we have performed exploratory data analysis for the project, we can now fit a time series model for
the data. First, we need to perform some checks to make sure that the data is in the correct format.
Stationarity check.
Let's start by checking if the time series is stationary or not. To do so we are going to use the Dickey Fuller
and/or augmented Dickey fuller test.
adf.test(df_month$Weighted_Price)
## Augmented Dickey-Fuller Test
## data: df_month$Weighted_Price
## Dickey-Fuller = 0.98776, Lag order = 4, p-value = 0.99
## alternative hypothesis: stationary
Shown by the augmented Dickey fuller test, the P values is not less than .05. So, we have to reject the null
hypothesis and the series is not stationary. That is, the time series has some time dependence structure and
the variance is not constant overtime.
It is important to make the series stationary before fitting a model to it, because, we observe a single run of a
stochastic process rather than repeated runs of the stochastic process. Hence, we need stationarity and
ergodicity so that observing a long run of a stochastic process is similar to observing many independent runs
of a stochastic process.
Plotting the ACF and PCF of the series.
6
Autocorrelation plot of the data decays very slowly. This is in line with the Dickey fuller test result, that the
Time series is not stationary. We can notice a high auto correlation evident from the trend present in the Data.
MAKING THE TIME SERIES STATIONARY
Using the Box Cox transformation:
The Box-Cox transformation is a family of power transformations indexed by a parameter lambda. This
transformation can be used whenever we have a non-stationary time series (non-constant variance). After
applying Box-Cox with a particular value of lambda the process may become stationary.
7
b <- boxcox(lm(df_month$Weighted_Price ~ 1))
lambda <- b$x[which.max(b$y)]
lambda
## [1] 0.1010101
new_df_month_exact <- (df_month$Weighted_Price ^ lambda - 1) / lambda
ACF and PACF plots of the Box Cox transformed series:
adf.test(new_df_month_exact)
## Augmented Dickey-Fuller Test
## data: new_df_month_exact
## Dickey-Fuller = -2.4263, Lag order = 4, p-value = 0.3997
## alternative hypothesis: stationary
8
We see that even after the Box Cox transformation, the series is still not stationary. The P value is greater than
0.5, indicating that the series is not stationary. Also, we can see from the acf and pacf plots, that there is still
autocorrelation present in the time series. To get rid of this correlation and make the series stationary, let's
take a seasonal differencing of the time series.
SEASONAL DIFFERENCING
From the ACF and PACF plots, we can see that the correlations have decreased significantly. The sample ACF
looks like a damped sine wave, but it does not die out like the theoretical ACF.
9
From the augmented Dickey Fuller test, we don't get a statistically significant P value. Hence the series is still
not stationary.
adf.test(diff(new_df_month_exact,lag = 12))
Augmented Dickey-Fuller Test
data: diff(new_df_month_exact, lag = 12)
Dickey-Fuller = -2.1158, Lag order = 4, p-value = 0.5287
alternative hypothesis: stationary
We can also see correlations in the ACF plot. To get rid of this correlation, we will do a regular differencing of
the series.
ADDITIONAL REGULAR DIFFERENCING
ACF and PACF of the series after another differencing.
10
## Augmented Dickey-Fuller Test
## data: diff(diff(new_df_month_exact,lag=12))
## Dickey-Fuller = -4.3998, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
After the regular differencing the series we see that the P value obtained from the augmented Dickey fuller
test is statistically significant i.e., is less than 0.05. Hence, we can now conclude that the series is stationary.
Also, from the ACF and PCF lots we can see that the correlations have significantly decreased.
Now that we have a stationary series, we'll have a look at the EACF plots to finalize the order of our AR, MA
models and then fit the model.
eacf(diff(diff(new_df_month_exact,lag = 12)))
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x o o o o o o o o x x x o
## 1 o o o o o o x o o o o x o o
## 2 x o o o o o x o o o o x o o
## 3 x x o o o o o o o o o x o o
## 4 x o o o o o o o o o o x x o
## 5 x x o x o o o o o o o x o o
## 6 x x o x o o o o o o o x o o
## 7 x o x x o o o o o o o x o o
DETERMINING THE ORDER OF THE MODEL
From the ACF, PACF and EACF we see that the best model to fit the data can be in the range of 0 to 2 for p, q
P and Q. We will select the best model based on the lowest value of AIC for values for P, Q, p and q.
Below I have written a nested for loop, which loops over every combination of possible value for P, Q, p and q
in the range of 0 to 2 and will fit a SARIMA model for all of them. Then the AICs of all these models with their
corresponding P, Q, p and q value are stored in a list. This list is then sorted in an ascending order, and then
the top value of the list is popped. This value represents the lowest AIC model along with the corresponding
P, Q, p and q value.
11
NESTED FOR LOOP FOR DERTMING THE VALUES OF P, Q, p and q
Qs <- seq(from = 0, to = 1, by = 1)
qs <- seq(from = 0, to = 2, by = 1)
Ps <- seq(from = 0, to = 2, by = 1)
ps <- seq(from = 0, to = 2, by = 1)
D=1
d=1
parameters <- expand.grid(ps, qs, Ps, Qs)
parameters_list = as.list(parameters)
results <- vector(mode = 'list')
p_val <- vector(mode = 'list')
q_val <- vector(mode = 'list')
Q_val <- vector(mode = 'list')
P_val <- vector(mode = 'list')
for(p in 0:2)
{for(q in 0:2)
{for(P in 0:2)
{for(Q in 0:1)
{if(p+d+q+P+D+Q<=10)
{model<-
arima(x=df_month$predSeries,order=c(p,d,q),seaonal=list(order=c(P,D,Q),period=12)
test<-Box.test(model$residuals,lag=log(length(model$residuals)))
sse=sum(model$residuals^2)
results<- append(results, model$aic)
p_val<- append(p_val, p)
q_val<- append(q_val, q)
Q_val<- append(Q_val, Q)
P_val<- append(P_val, P)
cat(p,d,q,P,D,Q, "AIC:",model$aic,"SSE:",sse, ‘p-value: ’,
test$p.value,’\n’)
}
}
}
}
}
final_result <- data.frame(matrix(unlist(results), nrow=length(results), byrow=TRUE))
final_result$p <- p_val
final_result$q <- q_val
final_result$P <- P_val
final_result$Q <- Q_val
colnames(final_result)[1] <- "AIC"
final_result <- final_result[order(final_result$AIC),]
final_result[1,]
## AIC p q P Q
## 52 -250.1339 1 0 2 0
FITTING A SARIMA MODEL AND FORECASTING IT
12
df_month$predSeries <- new_df_month_exact
pdqParam <- c(1,1,0)
PDQParam <- c(2,1,0)
manualFit <- arima(df_month$predSeries, pdqParam, seasonal = list(order = PDQParam, period =
12))
manualPred <- predict(manualFit, n.ahead = 25)
ts.plot(as.ts(df_month$predSeries), (manualPred$pred), lty = c(1,3),col="red",lwd=3)
Plotting the time series of the prediction with respect to the original data.
RESIDUAL ANALYSIS
Plotting the autocorrelations of the residuals of the Forecasted series. We can see that they don’t exhibit any
correlation amongst themselves and just one significant lag at 7
13
acf(residuals(manualFit))
ts.plot(residuals(manualFit),lwd=3,col="red",main='Residual Analysis')
ggqqplot(residuals(manualFit))
14
## Box-Ljung test
## data: residuals(manualFit)
## X-squared = 0.0035786, df = 1, p-value = 0.9523
As the p value is greater than 0.05 then the residuals are independent which we want for the model to be
correct.
LBQPlot(residuals(manualFit), lag.max = 30, SquaredQ = FALSE)
CONCLUSION:
15
As we can see from the residual analysis that the SARIMA (1, 1, 0) x (2, 1, 0, 12) is a good fit to the data.
However, as we know Bitcoin price is highly volatile and there are many more factors that can be considered
whole fitting the model to get more precise and accurate forecasts.
PART B: NON-SEASONAL DATASET: S&P 500 stock data
Introduction and Motivation
Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large
financial payoff. The amount of financial data on the web is seemingly endless. A large and well-structured dataset
on a wide array of companies can be hard to come by. Here I am using a dataset with historical stock prices (last 5
years) for all companies currently found on the S&P 500 index.
The goal of this project is to predict and forecast daily return values for a particular stock. Given the highly
volatile nature of stock data, we will fit a univariate GARCH model to achieve our goal of predicting the daily
returns value to allow you to make statistically informed trades!
Data Description
Date Range: From Feb 2013 to Feb 2018
Dataset Description: The dataset contains 5 years of stock data for all companies currently found on the S&P 500
index.
Datasource Description: The data is from Kaggle website and can be accessed using the link:
https://www.kaggle.com/datasets/camnugent/sandp500
Date - in format: yy-mm-dd
Open - price of the stock at market open (this is NYSE data so all in USD)
High - Highest price reached in the day
Low Close - Lowest price reached in the day
Volume - Number of shares traded
Name - the stock's ticker name
Initial look at the data:
Since the data set contains over 500 companies stock prices, we will just select one company’s stock price data for
the scope of this project. We will be using Apple’s Stock price data. Plotting the initial data.
df$date <- as_datetime(df$date)
df_ts <- xts(as.numeric(df$close),df$date)
chartSeries(df_ts)
As mentioned before, we will be predicting the daily returns of the stock. Daily return on a stock is used to
measure the day-to-day performance of stocks, it is the price of stocks at today's closure compared to the
price of the same stock at yesterday's closure.
16
The code below calculates the daily returns and also plots it for us.
return <- CalculateReturns(df_ts)
return <-return[-1]
chartSeries(return)
Histogram of the returns data
17
As we can see from the above returns plot, it is stationary and exhibits normality (confirmed from the plot
above). This is also confirmed by the augmented Dickey Fuller test where the P value is statistically significant
so that we can conclude that the data is indeed stationary. Also, we don’t see any trends in the data. The
important point here is that in the returns plot shows, there is a lot of volatility present. For example, we can
see periods of high volatility on Feb 03, 2014, and periods of low volatility around Feb 01, 2017. So, to capture
this volatility, we will be fitting a univariate Grach model.
Plotting the stocks rolling monthly and annual volatility.
18
DETERMINING THE ORDER OF THE MODEL
As we know that the ARMA models are used to capture the constant mean in the series, and the constant
variance will be captured by the GARCH model. We can see from the return plots that there is a constant
mean. We will verify this by using the nested for loop used in PART A of this project. Then we will determine
the order of the GARCH model by using the ACF and PACF plots.
As we are using the ugrachfit method from the “rugarch” library instead of the garch method from the
“tseries” library. We directly give the ARMA and GRACH order to the method ugrachfit and the return data
instead of giving the residuals as we would have for the garch method. This implementation can be verified
from references numbered [3,4,5].
Having a look at the ACF, PACF and EACF of the series to determine the order of the GRACH model. As we can
see from the ACF and PACF plots there are not significant lags, there is one at lag 7 but for the scope of this
project we will hence stick to GARCH (1,1) model.
19
Also, from our nested loop it is now confirmed that ARMA of order (0,0) gives the lowest AIC for this series.
final_result[1,]
## AIC p q
## 1 -7064.618 0 0
FINDING THE BEST GRACH MODEL
MODEL 1: Normal distribution GARCH MODEL (sGARCH is the standard model)
s <- ugarchspec(mean.model = list(armaOrder = c(0,0)),variance.model = list(model =
"sGARCH"),
distribution.model = 'norm')
m <- ugarchfit(data = return, spec = s, solver ='hybrid')
Let's have a look at some of the coefficients of this model.
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : norm
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.000957 0.000404 2.37090 0.017745
## omega 0.000000 0.000000 0.96945 0.332320
## alpha1 0.008655 0.000733 11.81086 0.000000
## beta1 0.989002 0.000592 1670.61474 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.000957 0.000418 2.288850 0.022088
## omega 0.000000 0.000005 0.090515 0.927878
20
## alpha1 0.008655 0.001191 7.267157 0.000000
## beta1 0.989002 0.001602 617.314884 0.000000
##
## LogLikelihood : 3541.387
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.6238
## Bayes -5.6075
## Shibata -5.6238
## Hannan-Quinn -5.6177
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.062 0.3028
## Lag[2*(p+q)+(p+q)-1][2] 1.100 0.4669
## Lag[4*(p+q)+(p+q)-1][5] 1.502 0.7395
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 3.501 0.06135
## Lag[2*(p+q)+(p+q)-1][5] 4.994 0.15359
## Lag[4*(p+q)+(p+q)-1][9] 6.495 0.24524
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 1.246 0.500 2.000 0.2643
## ARCH Lag[5] 1.526 1.440 1.667 0.5855
## ARCH Lag[7] 2.124 2.315 1.543 0.6910
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 305.5824
## Individual Statistics:
## mu 0.05258
## omega 27.21255
## alpha1 0.11354
## beta1 0.10269
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.07 1.24 1.6
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.2304 0.817814
## Negative Sign Bias 2.7854 0.005427 ***
## Positive Sign Bias 0.7584 0.448333
## Joint Effect 11.2500 0.010448 **
##
21
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 86.90 1.171e-10
## 2 30 96.44 3.608e-09
## 3 40 107.91 2.204e-08
## 4 50 108.93 1.906e-06
## Elapsed time : 0.152513
In general, lesser the information criteria better are the model.
As we can see from Ljung-Box Test on Standardized Squared Residuals since all values are greater than 0.05,
we cannot reject H0 i.e., the residuals more or less behave like a white noise process. (Highlighted in GREEN)
As Adjusted Pearson Goodness-of-Fit Test: Since all the values are less than 0.05, we reject the null hypothesis
and conclude that the model for residuals we used, i.e., the normal distribution is not a good choice. As for
good models we need p values above 0.05 for this test (Highlighted in YELLOW)
Also, from the QQ plot we can see that it's not a good fit as the tail ends do not fit well.
MODEL 2: To fit a better model, we use the skewed student t distribution GARCH MODEL
s <- ugarchspec(mean.model = list(armaOrder = c(0,0)),variance.model = list(model =
"sGARCH"),
distribution.model = 'sstd')
m <- ugarchfit(data = return, spec = s)
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : sGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
22
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001001 0.000380 2.63211 0.008486
## omega 0.000002 0.000003 0.56134 0.574567
## alpha1 0.034789 0.014457 2.40638 0.016111
## beta1 0.958942 0.013409 71.51432 0.000000
## skew 1.019679 0.038499 26.48559 0.000000
## shape 3.891047 0.467266 8.32727 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.001001 0.000402 2.490013 0.012774
## omega 0.000002 0.000025 0.077488 0.938235
## alpha1 0.034789 0.114872 0.302847 0.762006
## beta1 0.958942 0.105073 9.126445 0.000000
## skew 1.019679 0.041230 24.731646 0.000000
## shape 3.891047 4.561011 0.853111 0.393598
##
## LogLikelihood : 3639.673
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.7769
## Bayes -5.7524
## Shibata -5.7770
## Hannan-Quinn -5.7677
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.304 0.2534
## Lag[2*(p+q)+(p+q)-1][2] 1.312 0.4073
## Lag[4*(p+q)+(p+q)-1][5] 1.810 0.6641
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.3401 0.5598
## Lag[2*(p+q)+(p+q)-1][5] 0.5667 0.9469
## Lag[4*(p+q)+(p+q)-1][9] 0.9208 0.9893
## d.o.f=2
##
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.0005719 0.500 2.000 0.9809
## ARCH Lag[5] 0.4242667 1.440 1.667 0.9057
## ARCH Lag[7] 0.7116805 2.315 1.543 0.9554
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 80.219
23
## Individual Statistics:
## mu 0.04585
## omega 3.86477
## alpha1 0.48638
## beta1 0.56370
## skew 0.02211
## shape 1.13626
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.49 1.68 2.12
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.7161 0.4741
## Negative Sign Bias 1.4449 0.1487
## Positive Sign Bias 0.1492 0.8814
## Joint Effect 5.7662 0.1236
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 23.59 0.2124
## 2 30 25.90 0.6311
## 3 40 39.23 0.4594
## 4 50 54.24 0.2815
##
The Adjusted Pearson Goodness-of-Fit Test: is above 0.05 so it is definitely a better model. Skew values are
1.019679 i.e., approx. 1 so we have asymmetric distribution. The QQ plot shows a way better fit to the data.
The ACF of the residuals don’t have any significant lags and don’t exhibit any correlation.
24
It is a good fit to the data, except the impact of news for predicting the stock process is not considered (As
shown in the plot below)
So, let's fit a GJR GRACH model where the effect of news is taken into consideration.
MODEL 3: To include the news impact on forecasting, let's use the GJR GARCH MODEL
GJR GARCH MODEL ARMA(0,0)
s <- ugarchspec(mean.model = list(armaOrder = c(0,0)),
variance.model = list(model = "gjrGARCH"),
25
distribution.model = 'sstd')
m <- ugarchfit(data = return, spec = s)
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model : gjrGARCH(1,1)
## Mean Model : ARFIMA(0,0,0)
## Distribution : sstd
##
## Optimal Parameters
## ------------------------------------
## Estimate Std. Error t value Pr(>|t|)
## mu 0.000944 0.000368 2.56456 0.010331
## omega 0.000012 0.000000 55.77643 0.000000
## alpha1 0.002302 0.005202 0.44254 0.658098
## beta1 0.866296 0.015132 57.24930 0.000000
## gamma1 0.166615 0.035869 4.64509 0.000003
## skew 1.015644 0.039148 25.94355 0.000000
## shape 4.253347 0.499061 8.52270 0.000000
##
## Robust Standard Errors:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.000944 0.000394 2.39332 0.016697
## omega 0.000012 0.000000 64.86986 0.000000
## alpha1 0.002302 0.005930 0.38823 0.697843
## beta1 0.866296 0.016674 51.95337 0.000000
## gamma1 0.166615 0.037996 4.38512 0.000012
## skew 1.015644 0.038725 26.22696 0.000000
## shape 4.253347 0.526336 8.08104 0.000000
##
## LogLikelihood : 3656.336
##
## Information Criteria
## ------------------------------------
##
## Akaike -5.8018
## Bayes -5.7732
## Shibata -5.8019
## Hannan-Quinn -5.7911
##
## Weighted Ljung-Box Test on Standardized Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 1.220 0.2693
## Lag[2*(p+q)+(p+q)-1][2] 1.237 0.4272
## Lag[4*(p+q)+(p+q)-1][5] 1.802 0.6660
## d.o.f=0
## H0 : No serial correlation
##
## Weighted Ljung-Box Test on Standardized Squared Residuals
## ------------------------------------
## statistic p-value
## Lag[1] 0.05015 0.8228
## Lag[2*(p+q)+(p+q)-1][5] 0.70509 0.9219
## Lag[4*(p+q)+(p+q)-1][9] 1.67689 0.9399
## d.o.f=2
##
26
## Weighted ARCH LM Tests
## ------------------------------------
## Statistic Shape Scale P-Value
## ARCH Lag[3] 0.03601 0.500 2.000 0.8495
## ARCH Lag[5] 1.16336 1.440 1.667 0.6851
## ARCH Lag[7] 1.88759 2.315 1.543 0.7411
##
## Nyblom stability test
## ------------------------------------
## Joint Statistic: 38.6327
## Individual Statistics:
## mu 0.10511
## omega 12.83641
## alpha1 0.49054
## beta1 0.98448
## gamma1 0.36038
## skew 0.02083
## shape 1.34598
##
## Asymptotic Critical Values (10% 5% 1%)
## Joint Statistic: 1.69 1.9 2.35
## Individual Statistic: 0.35 0.47 0.75
##
## Sign Bias Test
## ------------------------------------
## t-value prob sig
## Sign Bias 0.9288 0.3532
## Negative Sign Bias 0.1286 0.8977
## Positive Sign Bias 0.1831 0.8547
## Joint Effect 1.2585 0.7390
##
##
## Adjusted Pearson Goodness-of-Fit Test:
## ------------------------------------
## group statistic p-value(g-1)
## 1 20 20.28 0.3777
## 2 30 24.08 0.7248
## 3 40 36.24 0.5962
## 4 50 43.67 0.6884
27
28
The model fit is quite similar except the news now impacts our forecasts. We can see from the news impact
curve if there is any positive news the increase in the price of the model is gradual. So, if the effect of certain
news on the market if negative the impact is significantly high.
29
SIMULATION AND FORECASTING
We will be using the GJR GARCH model for the final forecasting
m <- ugarchfit(data = return, spec = s)
sfinal <- s
setfixed(sfinal) <- as.list(coef(m))
coef(m)
Often we want to use an estimated model to subsequently forecast the conditional variance. The function
used for this purpose is the ugarchforecast function. In ugarchforecast( ), the data is ignored and the forecast
is based on the fitted model and the last few data points in the training set.
f2018 <- ugarchforecast(data = return["/2018-12"], fitORspec = sfinal, n.ahead = 252)
plot(sigma(f2018)) will give us the fitted standard deviation (as seen in the plot below)
30
Since the volatility at the end of 2018 was high it is expected that in 2019 it will fall.
sim <- ugarchpath(spec = sfinal, m.sim = 2, n.sim = 1*252, rseed = 123)
plot.zoo(fitted(sim))
Plotting the variability
31
Looking at the closing values at end of 2018 and then using it to forecast. We will use the apply function to get
the fitting returns and convert them to cumulative values
tail(df_stocks$close)
## [1] 167.43 167.78 160.50 156.49 163.03 159.54
p <- 159.54*apply(fitted(sim), 2, 'cumsum') + 159.54
matplot(p, type = "l", lwd = 3)
CONCLUSION:
32
Thus, we have fitted a GRJ-GRACH (1,1) model to get the daily return price of the stocks. The final plot takes
the closing price of the stock at the end of 2018 and predicts it for the next entire year.
REFERNCES:
1. Cryer, J. D., & Chan, K. S. (2008). Time series analysis: with applications in R (Vol. 2). New York: Springer.
2. Katesari, H. S., & Vajargah, B. F. (2015). Testing adverse selection using frank copula approach in Iran
insurance markets. Mathematics and Computer Science, 15(2), 154-158.
3. Katesari, H. S., & Zarodi, S. (2016). Effects of coverage choice by predictive modeling on frequency of
accidents. Caspian Journal of Applied Sciences Research, 5(3), 28-33.
4. Safari-Katesari, H., Samadi, S. Y., & Zaroudi, S. (2020). Modelling count data via copulas. Statistics,
54(6), 1329-1355.
5. Shumway, R. H., Stoffer, D. S., & Stoffer, D. S. (2000). Time series analysis and its applications (Vol.
3). New York: springer.
6. Safari-Katesari, H., & Zaroudi, S. (2020). Count copula regression model using generalized beta
distribution of the second kind. Statistics, 21, 1-12.
7. Safari-Katesari, H., & Zaroudi, S. (2021). Analysing the impact of dependency on conditional survival
functions using copulas. Statistics in Transition New Series, 22(1).
8. Safari Katesari, H., (2021) Bayesian dynamic factor analysis and copula-based models for mixed data,
PhD dissertation, Southern Illinois University Carbondale
9. Tsay, R. S. (2013). Multivariate time series analysis: with R and financial applications. John Wiley &
Sons.
10. Zaroudi, S., Faridrohani, M. R., Behzadi, M. H., & Safari-Katesari, H. (2022). Copula-based Modeling
for IBNR Claim Loss Reserving. arXiv preprint arXiv:2203.12750.
LINKS:
1. https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data
2. https://www.kaggle.com/datasets/camnugent/sandp500
3. https://www.rdocumentation.org/packages/forecast/versions/8.16/topics/Arima
4. https://people.duke.edu/~rnau/arimrule.htm
5. https://www.rdocumentation.org/packages/rugarch/versions/1.4-8a
6. https://faculty.washington.edu/ezivot/econ589/univariateGarch2012powerpoint.pdf
7. https://www.financialriskforecasting.com/seminars/seminars/Seminar4.html#4