ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Features, or contextual information, are additional data than can help predicting asset returns in financial problems. We propose a mean-risk portfolio selection problem that uses contextual information to maximize expected returns at each time period, weighing past observations via kernels based on the current state of the world. We consider yearly intervals for investment opportunities, and a set of indices that cover the most relevant investment classes. For those intervals, data scarcity is a problem that is often dealt with by making distribution assumptions. We take a different path and use distribution-free simulation techniques to populate our database. In our experiments we use the Conditional Value-at-Risk as our risk measure, and we work with data from 2007 until 2021 to evaluate our methodology. Our results show that, by incorporating features, the out-of-sample performance of our strategy outperforms the equally-weighted portfolio. We also generate diversified positions, and efficient frontiers that exhibit coherent risk-return patterns.
This content is subject to copyright. Terms and conditions apply.
A Synthetic Data-Plus-Features Driven Approach
for Portfolio Optimization
Bernardo K. Pagnoncelli
1
·Domingo Ramírez
2
·Hamed Rahimian
3
·
Arturo Cifuentes
4
Accepted: 8 May 2022 / Published online: 7 June 2022
©The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
2022
Abstract
Features, or contextual information, are additional data than can help predicting asset
returns in nancial problems. We propose a mean-risk portfolio selection problem
that uses contextual information to maximize expected returns at each time period,
weighing past observations via kernels based on the current state of the world. We
consider yearly intervals for investment opportunities, and a set of indices that cover
the most relevant investment classes. For those intervals, data scarcity is a problem
that is often dealt with by making distribution assumptions. We take a different path
and use distribution-free simulation techniques to populate our database. In our
experiments we use the Conditional Value-at-Risk as our risk measure, and we work
with data from 2007 until 2021 to evaluate our methodology. Our results show that,
by incorporating features, the out-of-sample performance of our strategy outperforms
the equally-weighted portfolio. We also generate diversied positions, and efcient
frontiers that exhibit coherent risk-return patterns.
&Bernardo K. Pagnoncelli
bernardo.pagnoncelli@skema.edu
Domingo Ramírez
djramirez@uc.cl
Hamed Rahimian
hrahimi@clemson.edu
Arturo Cifuentes
aocusa@gmail.com
1
Université Côte dAzur, Av. Willy Brandt, 59777 SKEMA Business School, Lille, France
2
Ponticia Universidad Católica de Chile, Av Libertador Bernardo OHiggins 340, Santiago,
Chile
3
Department of Industrial Engineering, Clemson University, Clemson, SC 29634, USA
4
CLAPES-UC, Santiago, Chile
123
Computational Economics (2023) 62:187204
https://doi.org/10.1007/s10614-022-10274-2(0123456789().,-volV)(0123456789().,-volV)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Kolm et al. (2014) summarize well the challenges associated with the implementation of Markowitz's approach. Pagnoncelli et al. (2022) provide a brief overview of the different techniques that have attempted to reconcile the implementation of the MV formulation with reality. John Bogle, who founded the Vanguard Group (an asset management company) and is recognized as the father of index investing, is another pioneer whose main idea was revolutionary at the time and remains influential until today. ...
... More importantly, we rely on synthetic returns data generated with a Modified Conditional GAN approach which we enhance with contextual information (in our case, the U.S. Treasury yield curve). In a sense, our approach follows the spirit of Pagnoncelli et al. (2022), but it differs in several important ways and brings with it important advantages-including performance-a topic we discuss in more detail later in this paper. In summary, our goals are twofold. ...
... (Recall that a frequent criticism to the conventional MV-approach is that it often yields corner solutions based on portfolios heavily concentrated on a few assets.) To measure the degree of diversificatio, we follow Pagnoncelli et al. (2022), and rely on the complementary Herfindahl-Hirschman (HH) Index. A value of 0 for the index reflects a portfolio concentrated on a single asset. ...
... Combining historic (observed) returns with the Gaussian copula has proven to be a successful choice to generate such synthetic data. Additionally, this approach has demonstrated that it is fairly stable and that can generate extreme scenarios, that is, adverse returns scenarios compatible with the existing data even if those scenarios have not been observed (Pagnoncelli et al. 2022). ...
... very much in line with the approach suggested by Gutierrez et al. (2019) and Pagnoncelli et al. (2022). In short, we have created M − K + 1 return vectors, each of dimension N + 1. ...
... This approach is far more stable than the traditional correlation-matrix-based approach based on Markowitz formulation. A detailed discussion of its implementation is presented in Pagnoncelli et al. (2022). ...
Article
This article presents a new framework to evaluate the merits of an art investment that differs substantially from previous studies. First, it assumes that the investor already holds a portfolio consisting of more traditional assets and is planning to add art to it. This is far more realistic than the usual academic set up in which it is assumed that the investor is planning to deploy a given amount of cash among many assets, one of which is art. Second, the approach departs from the traditional Markowitz’s mean-variance framework in two important ways: (i) the efficient frontier is constructed based on cumulative returns, rather than average returns, and risk is assessed via potential losses and not volatility; and (ii) it relies on a semi-parametric approach to generate synthetic data based on the Gaussian copula and historic returns. Finally, an alternative risk metric, based on losses and not volatility, is introduced. The usefulness of this framework is demonstrated with an example based on art sales auction data.
... Kolm, Tütüncü, and Fabozzi (2014) summarize well the challenges associated with the implementation of Markowitz's approach. Pagnoncelli, Ramírez, Rahimian, and Cifuentes (2022) provide a brief overview of the different techniques that have attempted to reconcile the implementation of the MV formulation with reality. John Bogle, who founded the Vanguard Group (an asset management company) and is recognized as the father of index investing, is another pioneer whose main idea was revolutionary at the time and remains influential until today. ...
... More importantly, we rely on synthetic returns data generated with a Modified Conditional GAN approach which we enhance with contextual information (in our case, the U.S. Treasury yield curve). In a sense, our approach follows the spirit of Pagnoncelli et al. (2022), but it differs in several important ways and brings with it important advantages-including performance-a topic we discuss in more detail later in this paper. In summary, our goals are twofold. ...
... (Recall that a frequent criticism to the conventional MV-approach is that it often yields corner solutions based on portfolios heavily concentrated on a few assets.) To measure the degree of diversificatio, we follow Pagnoncelli et al. (2022), and rely on the complementary Herfindahl-Hirschman (HH) Index. A value of 0 for the index reflects a portfolio concentrated on a single asset. ...
Preprint
Full-text available
We propose a new approach to portfolio optimization that utilizes a unique combination of synthetic data generation and a CVaR-constraint. We formulate the portfolio optimization problem as an asset allocation problem in which each asset class is accessed through a passive (index) fund. The asset-class weights are determined by solving an optimization problem which includes a CVaR-constraint. The optimization is carried out by means of a Modified CTGAN algorithm which incorporates features (contextual information) and is used to generate synthetic return scenarios, which, in turn, are fed into the optimization engine. For contextual information we rely on several points along the U.S. Treasury yield curve. The merits of this approach are demonstrated with an example based on ten asset classes (covering stocks, bonds, and commodities) over a fourteen-and-half year period (January 2008-June 2022). We also show that the synthetic generation process is able to capture well the key characteristics of the original data, and the optimization scheme results in portfolios that exhibit satisfactory out-of-sample performance. We also show that this approach outperforms the conventional equal-weights (1/N) asset allocation strategy and other optimization formulations based on historical data only.
... These conditions can be estimated using contextual information such as the current weather, time of day, and the state of the road network [1,2]. Similarly, in portfolio investment problems, decisions on allocating a limited budget across assets with uncertain returns can benefit from macroeconomic indicators, news sentiment, or even social media trends as sources of contextual information [3,4,5,6]. In this setting, contextual optimization (CO) provides a formal mathematical framework for addressing such problems, as outlined in [7]. ...
... Conversely, in the pessimistic case, the worst-case minimizer z ⋆ (f (x)) is chosen, as determined by the highest leader's objective cost. The case of continuous optimization is examined in [11], where the lower-level problem is reformulated using Karush-Kuhn-Tucker (KKT) conditions, resulting in a single-level formulation for the optimistic case of the loss function (4). However, as the optimistic case can produce degenerate solutions, a quadratic regularization term is incorporated into the objective function to address this issue. ...
Preprint
Full-text available
The recent interest in contextual optimization problems, where randomness is associated with side information, has led to two primary strategies for formulation and solution. The first, estimate-then-optimize, separates the estimation of the problem's parameters from the optimization process. The second, decision-focused optimization, integrates the optimization problem's structure directly into the prediction procedure. In this work, we propose a pessimistic bilevel approach for solving general decision-focused formulations of combinatorial optimization problems. Our method solves an ε\varepsilon-approximation of the pessimistic bilevel problem using a specialized cut generation algorithm. We benchmark its performance on the 0-1 knapsack problem against estimate-then-optimize and decision-focused methods, including the popular SPO+ approach. Computational experiments highlight the proposed method's advantages, particularly in reducing out-of-sample regret.
... Suggested a feature-based data-driven portfolio selection strategy. We target medium-and long-term clients in liquid markets where passive investing is viable, thus our positions are asset allocation exposures [24] Fatima Dakalbab et al. analyzed article published between 2015 to 2023 and identified 8 financial markets in research publications that were utilized to develop prediction models. Technical analysis is more commonly used than fundamental analysis. ...
Article
Full-text available
This study investigates the use of a preceding day’s trading pattern to generate a LONG or SHORT signal for intraday stock trading the day before trading begins. Positions taken LONG or SHORT at the start and close of the trading day. Using stock time series data, we first identify key traits; then, using a combination of machine learning and a deep learning algorithm, we predict traders’ actions for the next trading day. We applied decision tree (DT), random forest (RF), logistic regression (LR), support vector machine (SVM) classifier and an artificial neural network (ANN) for predicting the LONG or SHORT position. The following results are based on an experiment that was run on real market data from the website of the National Stock Exchange (NSE). We analyzed 2459 completed transactions spanning from January 2011 to November 2020. When compared to other classifiers, Support Vector Machine (SVM) with a 70% training data to 30% testing data ratio performed the best. The SVM classifier with kernel = ‘rbf’ provided the most accurate predictions (72.41 percent). We evaluate the performance of the BUY and HOLD strategy against the outcomes of our experiments. We also test our trading model with a 1% stop-loss, a 1.5% stop-loss, and a 2% stop-loss, and compare the results. The findings here might serve as a foundation for your intraday stock trading strategy. This forecast is useful information for a day trader to have.
... Second, recent research by Gutierrez et al. (2019) and Pagnoncelli et al. (2022), reinforced the relevance of the risk constraint. These authors used an optimization approach to create, based on the same set of feasible assets and in combination with a CVaR constraint, funds with different risk-return profiles. ...
Chapter
In this study, we have studied the role of artificial intelligence (AI) in the domain of portfolio management. There has been an exponential growth of AI-based models in how optimization is achieved within the mean-variance framework. We have primarily analyzed the bibliometric data for the article published on the present theme. The Scopus database was utilized to search for the articles from 2005 to 2024. The citation analysis was performed using the VOSviewer software. The study provides insights for researchers to identify sources for conducting the preliminary literature review, identify the research gaps, and develop relevant AI-based models for implementing optimization in the investment domain. Finally, we have also provided a detailed account of various AI-based models, their evolution, mapping specific models with the optimization objectives.
Article
Full-text available
This paper studies a fusion of concepts from stochastic programming and non-parametric statistical learning in which data is available in the form of covariates interpreted as predictors and responses. Such models are designed to impart greater agility, allowing decisions under uncertainty to adapt to the knowledge of predictors (leading indicators). This paper studies two classes of methods for such joint prediction-optimization models. One of the methods may be classified as a first-order method, whereas the other studies piecewise linear approximations. Both of these methods are based on coupling non-parametric estimation for predictive purposes, and optimization for decision-making within one unified framework. In addition, our study incorporates several non-parametric estimation schemes, including k nearest neighbors (kNN) and other standard kernel estimators. Our computational results demonstrate that the new algorithms proposed in this paper outperform traditional approaches which were not designed for streaming data applications requiring simultaneous estimation and optimization as important design features for such algorithms. For instance, coupling kNN with Stochastic Decomposition (SD) turns out to be over 40 times faster than an online version of Benders Decomposition while finding decisions of similar quality. Such computational results motivate a paradigm shift in optimization algorithms that are intended for modern streaming applications.
Article
Full-text available
In this paper, we propose a novel approach for data-driven decision-making under uncertainty in the presence of contextual information. Given a finite collection of observations of the uncertain parameters and potential explanatory variables (i.e., the contextual information), our approach fits a parametric model to those data that is specifically tailored to maximizing the decision value, while accounting for possible feasibility constraints. From a mathematical point of view, our framework translates into a bilevel program, for which we provide both a fast regularization procedure and a big-M-based reformulation that can be solved using off-the-shelf optimization solvers. We showcase the benefits of moving from the traditional scheme for model estimation (based on statistical quality metrics) to decision-guided prediction using three different practical problems. We also compare our approach with existing ones in a realistic case study that considers a strategic power producer that participates in the Iberian electricity market. Finally, we use these numerical simulations to analyze the conditions (in terms of the firm’s cost structure and production capacity) under which our approach proves to be more advantageous to the producer.
Article
Full-text available
We consider stochastic programs conditional on some covariate information, where the only knowledge of the possible relationship between the uncertain parameters and the covariates is reduced to a finite data sample of their joint distribution. By exploiting the close link between the notion of trimmings of a probability measure and the partial mass transportation problem, we construct a data-driven Distributionally Robust Optimization (DRO) framework to hedge the decision against the intrinsic error in the process of inferring conditional information from limited joint data. We show that our approach is computationally as tractable as the standard (without side information) Wasserstein-metric-based DRO and enjoys performance guarantees. Furthermore, our DRO framework can be conveniently used to address data-driven decision-making problems under contaminated samples. Finally, the theoretical results are illustrated using a single-item newsvendor problem and a portfolio allocation problem with side information.
Article
Full-text available
The poor out‐of‐sample performance of mean–variance portfolio model is mainly caused by estimation errors in the covariance matrix and the mean return, especially the mean return vector. Meanwhile, in financial practice, what most investors actually like is to hold a few stocks in their portfolio. The goal of this paper is to propose some new efficient mean–variance portfolio selection models by considering the following aspects: (i) use the L1‐regularization in objective function to obtain sparse portfolio; (ii) use the shrinkage method of Ledoit and Wolf, Journal of Economics Financial, 2003, 10, 603–621 to estimate the covariance matrix; (iii) use the robust optimization method to mitigate the estimation errors of the expected return. Finally, empirical analysis demonstrates that the proposed strategies have better out‐of‐sample performance.
Article
This paper proposes a Bayesian-averaging heterogeneous vector autoregressive portfolio choice strategy with many big models that outperforms existing methods out-of-sample on numerous daily, weekly, and monthly datasets. The strategy assumes that excess returns are approximately determined by a time-varying regression with a large number of explanatory variables that are the sample means of past returns. Investors consider the possibility that every period there is a regime change by keeping track of many models, but doubt that any specification is able to perfectly predict the distribution of future returns, and compute portfolio choices that are robust to model misspecification. This paper was accepted by Tyler Shumway, finance.
Article
This paper injects factor structure into the estimation of time-varying, large-dimensional covariance matrices of stock returns. Existing factor models struggle to model the covariance matrix of residuals in the presence of time-varying conditional heteroskedasticity in large universes. Conversely, rotation-equivariant estimators of large-dimensional time-varying covariance matrices forsake directional information embedded in market-wide risk factors. We introduce a new covariance matrix estimator that blends factor structure with time-varying conditional heteroskedasticity of residuals in large dimensions up to 1000 stocks. It displays superior all-around performance on historical data against a variety of state-of-the-art competitors, including static factor models, exogenous factor models, sparsity-based models, and structure-free dynamic models. This new estimator can be used to deliver more efficient portfolio selection and detection of anomalies in the cross-section of stock returns.
Article
Selecting the optimal Markowitz portfolio depends on estimating the covariance matrix of the returns of N assets from T periods of historical data. Problematically, N is typically of the same order as T, which makes the sample covariance matrix estimator perform poorly, both empirically and theoretically. While various other general-purpose covariance matrix estimators have been introduced in the financial economics and statistics literature for dealing with the high dimensionality of this problem, we here propose an estimator that exploits the fact that assets are typically positively dependent. This is achieved by imposing that the joint distribution of returns be multivariate totally positive of order 2 (MTP2). This constraint on the covariance matrix not only enforces positive dependence among the assets but also regularizes the covariance matrix, leading to desirable statistical properties such as sparsity. Based on stock market data spanning 30 years, we show that estimating the covariance matrix under MTP2 outperforms previous state-of-the-art methods including shrinkage estimators and factor models.
Article
Portfolio selection problems have been thoroughly studied under the risk-and-return paradigm introduced by Markowitz. However, the usefulness of this approach has been hindered by some practical considerations that have resulted in poorly diversified portfolios, or, solutions that are extremely sensitive to parameter estimation errors. In this work, we use sampling methods to cope with this issue and compare the merits of two approaches: a sample average approximation approach and a performance-based regularization (PBR) method that appeared recently in the literature. We extend PBR by incorporating three different risk metrics—integrated chance-constraints, quantile deviation, and absolute semi-deviation—and deriving the corresponding regularization formulas. Additionally, a numerical comparison using index-based portfolios is presented using historic data that includes the subprime crisis.