Conference PaperPDF Available

Value-at-risk portfolio estimation with copula on selected stocks using variable importance from random forest

Authors:

Abstract

In investment field, stockholders need to manage risk in diversifying stock in order to avoid losses during a crisis and maximize profits when the economy grows, using low-risk low return or high-risk high return strategy. This study aims to calculate the risk value of the selected stock diversification. PT Telekomunikasi Indonesia Tbk (TLKM.JK) stock is used as reference. Stocks included in LQ45 for 5 years are also used outside the reference. The machine learning random forest method is used to apply time series data and shows the degree of similarity between the reference and comparison stock using Variable Importance. Returns distribution will be combined using copula method to produce composite index and to determine Value-at-Risk. In the reference stock, a similar stock is used for the high risk-high return strategy (PT Indofood CBP Sukses Makmur Tbk/ICBP.JK). Dissimilar stock is used for the low risk-low return strategy (PT Bumi Serpong Damai Tbk/BSDE.JK). The copula functions show TLKM.JK-ICBP.JK follows Copula Gauss while TLKM.JK-BSDE.JK follows Copula-t distribution. In the combination of TLKM.JK-ICBP.JK, the lowest risk is obtained with a composition of 10% TLKM.JK and 90% ICBP.JK. The lowest risk for TLKM.JK-BSDE.JK is gained with a composition of 90% TLKM.JK and 10% BSDE.JK.
AIP Conference Proceedings 2662, 020023 (2022); https://doi.org/10.1063/5.0109435 2662, 020023
© 2022 Author(s).
Value-at-risk portfolio estimation with
copula on selected stocks using variable
importance from random forest
Cite as: AIP Conference Proceedings 2662, 020023 (2022); https://doi.org/10.1063/5.0109435
Published Online: 22 December 2022
Ardhito Utomo, Anik Djuraidah and Aji Hamim Wigena
Value-at-Risk Portfolio Estimation with Copula on Selected
Stocks Using Variable Importance from Random Forest
Ardhito Utomo,a) Anik Djuraidah,b) and Aji Hamim Wigenac)
Department of Statistics, IPB University, Bogor, 16680, Indonesia.
a)
Corresponding author: ardhitoutomo@apps.ipb.ac.id
b)
Electronic mail: anikdjuraidah@apps.ipb.ac.id
c)
Electronic mail: aji_hw@apps.ipb.ac.id
Abstract. In investment field, stockholders need to manage risk in diversifying stock in order to avoid losses during a crisis and
maximize profits when the economy grows, using low-risk low return or high-risk high return strategy. This study aims to calculate
the risk value of the selected stock diversification. PT Telekomunikasi Indonesia Tbk (TLKM.JK) stock is used as reference.
Stocks included in LQ45 for 5 years are also used outside the reference. The machine learning random forest method is used to
apply time series data and shows the degree of similarity between the reference and comparison stock using Variable Importance.
Returns distribution will be combined using copula method to produce composite index and to determine Value-at-Risk. In the
reference stock, a similar stock is used for the high risk-high return strategy (PT Indofood CBP Sukses Makmur Tbk/ICBP.JK).
Dissimilar stock is used for the low risk-low return strategy (PT Bumi Serpong Damai Tbk/BSDE.JK). The copula functions
show TLKM.JK-ICBP.JK follows Copula Gauss while TLKM.JK-BSDE.JK follows Copula-t distribution. In the combination
of TLKM.JK-ICBP.JK, the lowest risk is obtained with a composition of 10% TLKM.JK and 90% ICBP.JK. The lowest risk for
TLKM.JK-BSDE.JK is gained with a composition of 90% TLKM.JK and 10% BSDE.JK.
INTRODUCTION
Stocks is one form of investment. Stocks have risks that are necessary to measure, and statistic approach such as
Value-at-Risk (VaR) is commonly used to estimate risk within the stock market, so stock players can build decisions
that are able to provide maximum profits and minimize losses [1]. VaR can facilitate in calculating the loss of a
portfolio with an exact confidence level [2].
In stock investment, diversification is also important in order to reduce the risk of loss [3]. High risk-high return
on stock investment happens once diversification is administered on two similar stocks, ensuing in high risk and high
returns. On the contrary, low risk-low returns on stocks that are not similar to one another lead to risk reduction,
however also yield lower returns.
The choice between similar or different stocks is assisted by one of the machine learning algorithms called Random
Forest (RF). The RF model requires the user to set a reference stock that is used to determine other stocks similarity
level towards the reference stock. Due to this, stocks that are able to explain reference stocks is a crucial factor in
reference stocks.
Time series data on the value of a stock or the yield value of a stock does not essentially follow a normal distri-
bution. In fact, each stock could have a distribution that is different from one another. To unite stocks with various
distributions, a technique that is able to capture several variations of the relationship is used, namely the copula [4].
The copula function is used to generate a portfolio of selected stocks, which will later be used to see the VaR in each
portfolio.
A similar study was conducted by [5], using the Copula-GARCH method. This study deals with the best copula
only without calculating VaR. Research has also been carried out by [6] and [7] using Copula-GARCH. The result
of copula is based on ARIMA-GARCH, which only one of the best models is chosen and is limited by the ability of
tuning parameter which means the work is done manually. The ARIMA-GARCH method makes the model that is
formed not reproducible because everytime the stock or the processing time is changed, almost every process that is
carried out needs to be changed as well.
This study aims to determine the important factors of reference stocks and the value-at-risk value of each portfolio
using copula. Each of these are important things to do, considering that they are able to provide maximum loss
estimation within a certain level of confidence that is more realistic. The application of the machine learning model
will facilitate in determining the stocks to be invested.
International Conference on Statistics and Data Science 2021
AIP Conf. Proc. 2662, 020023-1–020023-9; https://doi.org/10.1063/5.0109435
Published by AIP Publishing. 978-0-7354-4249-8/$30.00
020023-1
ANALYSIS STAGE
The data used is the daily stock price of companies whose stocks are included in the LQ45 list continuously during
the time used in the study, which is January 1, 2015 to January 1, 2020. The data is taken from the Yahoo Finance
website and listed in Table 1. The reference stock used is PT Telkom Indonesia Tbk. (TLKM.JK).
TABLE I. List of stocks used in the study.
Stocks
1. PT Adaro Energy Tbk. (ADRO.JK) 14. PT Indocement Tunggal Prakarsa Tbk. (INTP.JK)
2. PT AKR Corporindo Tbk. (AKRA.JK) 15. PT Jasa Marga Tbk. (JSMR.JK)
3. PT Astra International Tbk. (ASII.JK) 16. PT Kalbe Farma Tbk. (KLBF.JK)
4. PT Bank Central Asia Tbk. (BBCA.JK) 17. PT Matahari Department Store Tbk. (LPPF.JK)
5. PT Bank Negara Indonesia Tbk. (BBNI.JK) 18. PT Media Nusantara Citra Tbk. (MNCN.JK)
6. PT Bank Rakyat Indonesia Tbk. (BBRI.JK) 19. PT Perusahaan Gas Negara Tbk. (PGAS.JK)
7. PT Bank Tabungan Negara Tbk. (BBTN.JK) 20. PT Bukit Asam Tbk. (PTBA.JK)
8. PT Bank Mandiri Tbk. (BMRI.JK) 21. PT Surya Citra Media Tbk. (SCMA.JK)
9. PT Bumi Serpong Damai Tbk. (BSDE.JK) 22. PT Semen Indonesia Tbk. (SMGR.JK)
10. PT Gudang Garam Tbk. (GGRM.JK) 23. PT Telkom Indonesia Tbk. (TLKM.JK)
11. PT Indofood CBP Sukses Makmur Tbk. (ICBP.JK) 24. PT United Tractors Tbk. (UNTR.JK)
12. PT Vale Indonesia Tbk. (INCO.JK) 25. PT Unilever Indonesia Tbk. (UNVR.JK)
13. PT Indofood Sukses Makmur Tbk. (INDF.JK) 26. PT Wijaya Karya Tbk. (WIKA.JK)
Stocks, Daily Closing Value of Stocks, and Stock Returns
Stocks can be defined as a sign of participation or ownership of a person in a company or limited liability company
[8]. The stock price fluctuates at any time, but there is a daily closing value of stocks (P
t) which is determined based
on the last value of one stock unit that day. Meanwhile, the daily stock yield (Rt) of a stock is the difference between
the closing price of a certain day (t) and the previous day (t1) in percentage terms. Returns on the stock market
very rarely follow a normal distribution and returns between one stock often affect other stocks. The return on stocks
at time t(Rt) can be formulated as,
Rt=P
tP
t1
P
t
.(1)
Random Forest
The Random Forest (RF) method is the development of the Classification and Regression Tree (CART) method,
namely bootstrap aggregating (bagging) and random feature selection in the application of the [9] method. For each
tree with a continuous target variable, a measure of variance reduction is used in decision making for branching [10].
Random forest builds hundreds of trees to form a forest, which is then analyzed.
In the random forest framework, the most widely used score to see the importance of a particular variable (variable
importance) is the increase in the average tree error, namely the mean squared error (MSE) which can be developed
into the absolute average percentage of tree errors, namely the mean absolute percentage error (MAPE) for regression
and misclassification rate for classification, in the forest when the observed value of this variable is changed randomly
in the out of bag (OOB) sample [11, 12].
The suitability of the model can be measured based on the level of model error. One of the methods used to measure
the level of error in the model is MAPE [13]. If Ytis the observed value at time t,F
tis the modeled value at the same
time and nis the number of observations, then the error is defined as et=YtF
t, and MAPE is defined as,
020023-2
MAPE =1
n
n
t=1
et
Yt
.(2)
Time Delay Embedding
Time delay embedding represents time series in Euclidean space with dimension K, which makes it possible to use
linear or non-linear regression methods on time series data. The general form of time delay embedding is,
YK=
yKyK1... y2y1
.
.
..
.
.....
.
..
.
.
yiyi1... yiK+2yiK+1
.
.
..
.
.....
.
..
.
.
yNyN1... yNK+2yNK+1
.(3)
Normal, Logistic, and t-Student Distribution
The normal distribution, also known as the Gaussian distribution, is the most commonly used distribution in statistical
analysis. For example, given the continuous random variable X. In [14], the random variable Xis said to have a
normal distribution with an expected value of μand a variance of σ2if the probability density function is,
f(x;μ,σ)= 1
2πσ2e1
2(xμ
σ)2
.(4)
Then in [15], the logistic distribution L(μ,σ2)is the probability distribution on (,), with the probability
density function,
f(x;μ,σ)= π
σ3
eπ(xμ)/σ3
(1+eπ(xμ)/σ3)2,<x<,(5)
where μis the expected value, and σis the standard deviation. Meanwhile, according to [14], the t-Student distribution
has a probability density function,
f(t;n)= Γ(n+1
2)
nπΓ(n
2)1+t2
nn+1
2
=1+t2
nn+1
2
nB1
2,n
2,(6)
with parameters nN,tR. Then, Γis the Gamma function and Bis the Beta function [14].
Copula
Copula is a function that can combine several single variable marginal distribution functions that are uniform in one
dimension into one multivariable distribution function [16]. According to [17, 18], copulas can better describe the
dependencies between random variables. Some of the popular types of copula used, namely:
1. Copula Gauss
2. Copula-t
3. Copula Frank
020023-3
The maximum likelihood (ML) method can be used as an estimator of the copula parameters. In the observation
value X1,X2,...,Xnis x1,x2,...,xn, then based on [19], the likelihood function for continuous distribution is,
L(x1,x2,...,xn)= f(x1,x2,...,xn|θ1,θ2,...,θm).(7)
The value of ˆ
θ1=T1,ˆ
θ2=T2, ..., ˆ
θm=Tmis called the maximum likelihood estimator, where ˆ
θjis a random
variable. The value of the maximum likelihood is obtained from solving the following equation.
L(x1,x2,...,xn|θ1,θ2,...,θm)
∂θ
i
=0,i=1,2,...,m.(8)
Value-at-Risk
Value-at-Risk (VaR) is one of the most frequently used methods for estimating risk in the stock market which is able
to provide information about the loss of a portfolio with a certain level of confidence [2]. The increasing use of VaR
as a tool for measuring risk has recently come from the collapse of Barings Bank in 1995 and several other financial
institutions.
Given a confidence level α(0,1). VaR in a portfolio with a confidence level of αis given by the smallest value
of lso that the probability of a loss of Lexceeding lis not greater than (1α)( [17]). Mathematically,
VaRα=inf{lR:P(L>l)1α}=inf{lR:FL(l)α}.(9)
RESULT AND DISCUSSION
Stock Selection
The random forest method is used at the stage of selecting stocks with reference stocks. The random forest method
used is a random forest method on numerical data, which has a variable yor a target, and xas a predictor. The
reference stock (TLKM.JK) is used as the value of y, while the other stocks (described in Chapter ) are used as the
value of xin the modeling.
Random forest generates a model with R2worth 0.9892886 in the TLKM.JK reference data, which means the model
can be said to be good. Importance variables generated from the model are shown in Table II. Based on the results
of the variable importance in the model, the stocks of PT Indofood CBP Sukses Makmur Tbk. (ICBP.JK) and PT
Bumi Serpong Damai Tbk. (BSDE.JK) were selected as stocks used in subsequent processes together with Reference
stocks, PT Telkom Indonesia Tbk. (TLKM.JK).
TABLE II. The value of the variable importance of each stock in the reference stock.
x Overall
ICBP.JK 100.0000000
BBTN.JK 90.5906082
ADRO.JK 52.9175773
.
.
..
.
.
INCO.JK 0.2451653
SCMA.JK 0.1082907
BSDE.JK 0.0000000
020023-4
Calculating Returns
The calculation of the yield on the stocks that have been determined in Chapter is carried out by applying the
machine learning random forest model to each stock. This random forest modeling is the same as the regression data
random forest modeling that was carried out in Chapter , but because there is only one value, namely the stock itself,
modifications are made to the data first using the Time Delay Embedding method in Chapter . This makes it possible
to predict one day’s data ybased on the previous day’s data (y1toyk) with kdefined as 5, which means 5 days
back.
The model results for each stock are viewed using the Mean Absolute Percentage Error (MAPE) on the testing data,
with TLKM.JK being 1.008163, ICBP.JK being 1.329304, and BSDE.JK being 5.958407. Model results are used to
calculate stock returns. Each model is shown in Figure 1.
FIGURE 1. Stocks used with original value (black) and estimated value (blue), all data (left) and test data only (right).
020023-5
Determining the Marginal Distribution
To find out the marginal distribution that can be approximated by the standardized score of returns on each stock, a plot
is formed between the empirical, t-Student, normal, and logistic distributions for each stock. The graph to compare
the empirical distribution with the marginal distributions is shown in Figure 2. The blue line shows the empirical
distribution of the standard error for each stock, the red line is the approximation of the t-Student marginal distribution,
purple is the logistic, and orange is the normal distribution. A better test is needed because each distribution produces
values that are almost similar to each other.
FIGURE 2. The distribution of the standardized score of each stock.
A test will be conducted on which of the three distributions is the closest to the empirical distribution of standard
error, namely the Anderson-Darling test. The Anderson-Darling test was conducted to ensure which marginal distri-
bution is closest to the empirical distribution. The results of the Anderson-Darling test can be seen in Table III. The
results of the Anderson-Darling test show that the three stocks are closest to the logistic distribution.
TABLE III. Anderson-Darling test results on each stock, with a normal, t-Student, and Logistic distribution.
Stocks Normal Distribution t-Student Distribution Logistic Distribution
an p-value an p-value an p-value
TLKM.JK 4.5354 0.004793 510.78 0.0000004839 0.45194 0.7961
ICJP.JK 6.1619 0.0008118 278.86 0.0000004839 0.96436 0.3765
BSDE.JK 3.7913 0.01104 254.6 0.0000004839 0.54647 0.6999
020023-6
Determing Most Suitable Copula
The most suitable copula is determined by pairing the reference stock with the other two stocks. After pairing, each
copula is calculated using the maximum likelihood. The largest maximum likelihood value of the three copulas in
each stock pair will be selected as the best copula. The best copula search is described in Table IV.
TABLE IV. Maximum likelihood value in each portfolio for Copula Gauss, Copula-t, and Copula Frank.
Portoflio Maximum Likelihood
Copula Gauss Copula-t Copula Frank
TLKM.JK and ICBP.JK 31.66 31.52 21.9
TLKM.JK and BSDE.JK 25.44 25.54 22.15
Based on the Table IV, we found that each type of copula produces values that are close from each other. Selected
copula in the first portfolio is Copula Gauss, and for the second most suitable portfolio is Copula-t.
Portfolio Building
The formation of the portfolio is done by forming a joint density function using the most suitable copula from each
portfolio. Then, paired data was generated between each portfolio (TLKM.JK-ICBP.JK, and TLKM.JK-BSDE.JK).
The joint density function can be seen in Figure 3.
FIGURE 3. The joint distribution function of the two portfolios using their respective selected copulas.
020023-7
Estimating VaR
After obtaining the joint distribution function between stocks, the next step is to determine the Value-at-Risk (VaR) of
the stock portfolio using the method from [20]. First, a portfolio consisting of two stocks is prepared with the weight
of each stock assumed to be constant at any point in time. In other words,
Rp
t=W1R1t+W2R2t,(10)
where W1is the weight of the first stock, W2is the weight of the second stock, R1tis the simulation return of the first
stock at time t, and R2tis the simulation return of the second stock at time t. In this step, two portfolios are formed
using the above equation based on the correlation coefficient between stocks and references, namely the stock with the
largest and smallest correlation coefficient. The purpose of selecting a portfolio based on the correlation coefficient is
to know in this case whether or not the difference in correlation between the two portfolios has an effect on the VaR
value.
Suppose z1z2≤···≤znis the generated data Zsorted from smallest to largest, where nis the number of data. If
the percentile used is α, then k=[αn]+1. The estimated VaRαfrom the data is VaRα(Z)=zk, which is the korder
data. The value of Ztis the result of the generation following the selected combined distribution. By getting Zt,Rt
determined using the formula,
Rt=(Zt×σˆy)+ ˆy.(11)
Based on the Rtsimulation results of each stock that have been obtained, a portfolio is formed using Equation 10
with various combinations of W1and W2values. The value of Rt
pfor each portfolio is then sorted from small to large.
The percentile used is α=5% and the amount of data is n=1240, so we get k=[an]+1=[(0.05)(1240)] + 1=63.
Thus, the estimated VaR value of the portfolio is the 63rd data from the sorted Rt
pportfolio. The estimated VaR values
of the two portfolios can be seen in Table V and Table VI.
TABLE V. Stock Compositions in TLKM.JK and ICBP.JK portfolios with VaR value.
Stock Compositions VaR
90% TLKM.JK and 10% ICBP.JK 3.654896%
80% TLKM.JK and 20% ICBP.JK 3.594748%
70% TLKM.JK and 30% ICBP.JK 3.534600%
Portofolio TLKM.JK 60% TLKM.JK and 40% ICBP.JK 3.474452%
and ICBP.JK 50% TLKM.JK and 50% ICBP.JK 3.414304%
40% TLKM.JK and 60% ICBP.JK 3.354156%
30% TLKM.JK and 70% ICBP.JK 3.294008%
20% TLKM.JK and 80% ICBP.JK 3.233860%
10% TLKM.JK and 90% ICBP.JK 3.173712%
TABLE VI. Stock Compositions in TLKM.JK and BSDE.JK portfolios with VaR value.
Stock Compositions VaR
90% TLKM.JK and 10% BSDE.JK 4.195189%
80% TLKM.JK and 20% BSDE.JK 4.675334%
70% TLKM.JK and 30% BSDE.JK 5.155478%
Portofolio TLKM.JK 60% TLKM.JK and 40% BSDE.JK 5.635623%
and BSDE.JK 50% TLKM.JK and 50% BSDE.JK 6.115768%
40% TLKM.JK and 60% BSDE.JK 6.595913%
30% TLKM.JK and 70% BSDE.JK 7.076057%
20% TLKM.JK and 80% BSDE.JK 7.556202%
10% TLKM.JK and 90% BSDE.JK 8.036347%
In the first portfolio, the smallest VaR value is achieved if the stock composition is 10% TLKM.JK and 90%
ICBP.JK worth 3.173712%, and in the second portfolio, the smallest VaR value is achieved by the stock composition
90% TLKM.JK and 10 % BSDE.JK worth 4.195189%. Based on the VaR value, in the first portfolio, if an investor
invests IDR 1,000,000.00 in a portfolio consisting of 10% TLKM.JK and 90% ICBP.JK, then the investor has a less
020023-8
than 5% chance of experiencing greater than IDR 31,700.00 loss. In the second portfolio, if an investor invests Rp.
1,000,000.00 in a portfolio consisting of 90% TLKM.JK and 10% BSDE.JK, then the investor has a less than 5%
chance of experiencing a loss greater than Rp. 42,000.00.
CONCLUSION
In this study, the selection of stocks with reference stocks is assisted by the random forest method, then the estimated
return value is calculated, the marginal distribution of each selected stock is analysed, and data is generated using the
copula function to determine the Value-at-Risk. Based on the variable importance of random forest, it is known that
the most important factor for reference stocks is ICBP.JK, and the opposite is BSDE.JK. The VaR results of the two
portfolios formed (TLKM.JK-ICBP.JK and TLKM.JK-BSDE.JK), shows that the smallest VaR results were obtained
in the portfolio composition of 10% TLKM.JK and 90% ICBP.JK, which was worth 3.173712%.
REFERENCES
1. R. T. Rockafellar and S. Uryasev, “Conditional value-at-risk for general loss distributions, Journal of banking
& finance 26, 1443–1471 (2002).
2. S. Shams and F. K. Haghighi, “A copula-garch model of conditional dependencies: estimating tehran market
stock exchange value-at-risk, Journal of Statistical and Econometric Methods 2, 39–50 (2013).
3. W. H. Wagner and S. C. Lau, “The effect of diversification on risk, Financial Analysts Journal 27, 48–53 (1971).
4. D. Y. Straumann, “Correlation and dependency in risk management: properties and pitfalls,” in Risk Manage-
ment: Value at Risk and Beyond (Cambridge University Press, 2001).
5. I. S. Bramantya, “Permodelan indeks harga saham gabungan dan penentuan rank correlation dengan menggu-
nakan copula,” IPB University (2014).
6. N. F. Fortina, “Pendugaan value-at-risk portofolio dengan pendekatan copula-garch. IPB University (2019).
7. V. M. Rahmawati, “Pendugaan value-at-risk dan expected-shortfall portofolio dari tiga aset keuangan dengan
pendekatan copula-arma-garch,” IPB University (2020).
8. Darwis, B. Sartono, and A. H. Wigena, “Indonesia stock exchange composite modelling with gaussian cop-
ula marginal regression, International Journal of Engineering and Management Research (IJEMR) 6, 311–314
(2016).
9. L. Breiman, “Random forests,” Machine learning 45, 5–32 (2001).
10. A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,”
Journal of Chemometrics: A Journal of the Chemometrics Society 18, 275–285 (2004).
11. A. Fisher, C. Rudin, and F. Dominici, “All models are wrong but many are useful: variable importance
for black-box, proprietary, or misspecified prediction models, using model class reliance,” arXiv preprint
arXiv:1801.01489 , 237–246 (2018).
12. A. A. Fauzi, A. M. Soleh, and A. Djuraidah, “Kajian simulasi perbandingan metode regresi kuadrat terkecil
parsial, support vector machine, dan random forest,” Indonesian Journal of Statistics and Its Applications 4,
203–215 (2020).
13. R. Hyndman, S. Makridakis, and S. Wheelwright, “Forecasting—methods and applications,” Journal of the
Operational Research Society (1998).
14. C. Walck et al., “Hand-book on statistical distributions for experimentalists, University of Stockholm 10 (2007).
15. N. Balakrishnan, Handbook of the logistic distribution (CRC Press, 1991).
16. A. Lamba, S. Singh, S. Balvinder, N. Dutta, and S. Rela, “Mitigating cyber security threats of industrial control
systems (scada & dcs),” in 3rd International Conference on Emerging Technologies in Engineering, Biomedical,
Medical and Science (ETEBMS–July 2017) (2017).
17. A. J. McNeil, R. Frey, and P. Embrechts, Quantitative risk management: concepts, techniques and tools-revised
edition (Princeton university press, 2015).
18. R. Budiarti, Pengembangan Metode Pendugaan Minimum Distance pada Model Berbasis Copula Nilai Ekstrim,
Aplikasi pada Pendugaan Value-at-Risk Por-tofolio., Ph.D. thesis, IPB University (2018).
19. N. L. Johnson, A. W. Kemp, and S. Kotz, Univariate discrete distributions, Vol. 444 (John Wiley & Sons, 2005).
20. S. A. Klugman, H. H. Panjer, and G. E. Willmot, Loss models: from data to decisions, Vol. 715 (John Wiley &
Sons, 2012).
020023-9
Article
Full-text available
Highly correlated predictors and nonlinear relationships between response and predictors potentially affected the performance of predictive modeling, especially when using the ordinary least square (OLS) method. The simple technique to solve this problem is by using another method such as Partial Least Square Regression (PLSR), Support Vector Regression with kernel Radial Basis Function (SVR-RBF), and Random Forest Regression (RFR). The purpose of this study is to compare OLS, PLSR, SVR-RBF, and RFR using simulation data. The methods were evaluated by the root mean square error prediction (RMSEP). The result showed that in the linear model, SVR-RBF and RFR have large RMSEP; OLS and PLSR are better than SVR-RBF and RFR, and PLSR provides much more stable prediction than OLS in case of highly correlated predictors and small sample size. In nonlinear data, RFR produced the smallest RMSEP when data contains high correlated predictors.
Article
Modeling the dependency between stock market returns is a difficult task when returns follow a complicated dynamics. It is not easy to specify the multivariate distribution relating two or more return series. In this paper, a methodology based on fitting ARIMA, GARCH and ARMA-GARCH models and copula functions is applied. In such methodology, the dependency parameter can easily be rendered conditional and time varying. This method is used to the daily returns of five major stock markets (Telecom (TE), Sina darou (SI), Motojen (MO), Mellat bank (ME), and Esfahan oil refinery (ES)). Then Value-at-Risk of Tehran Stock Exchange portfolio including mentioned assets, is estimated. Mathematics Subject Classification: 62H05, 62M15
Article
Fundamental properties of conditional value-at-risk (CVaR), as a measure of risk with significant advantages over value-at-risk (VaR), are derived for loss distributions in finance that can involve discreetness. Such distributions are of particular importance in applications because of the prevalence of models based on scenarios and finite sampling. CVaR is able to quantify dangers beyond VaR and moreover it is coherent. It provides optimization short-cuts which, through linear programming techniques, make practical many large-scale calculations that could otherwise be out of reach. The numerical efficiency and stability of such calculations, shown in several case studies, are illustrated further with an example of index tracking.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.