Chapter

Statistical Learning Theory in Equity Return Forecasting

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We apply Mangasarian and Bennett’s multi-surface method to the problem of allocating financial capital to individual stocks. The strategy constructs market neutral portfolios wherein capital exposure to long positions equals exposure to short positions at the beginning of each weekly period. The optimization model generates excess returns above the S&P 500, even in the presence of reasonable transaction costs. The trading strategy generates statistical arbitrage for trading costs below 10 basis points per transaction.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that a linear SVM (no kernels) with -norm regularization inherently performs variable selection as a side-effect of minimizing capacity of the SVM model. The distribution of the linear model weights provides a mechanism for ranking and interpreting the effects of variables. Starplots are used to visualize the magnitude and variance of the weights for each variable. We illustrate the effectiveness of the methodology on synthetic data, benchmark problems, and challenging regression problems in drug design. This method can dramatically reduce the number of variables and outperforms SVMs trained using all attributes and using the attributes selected according to correlation coefficients. The visualization of the resulting models is useful for understanding the role of underlying variables.
Article
Full-text available
Predictable variation in equity returns might reflect either (1) predictable changes in expected returns or (2) market inefficiency and stock price “overreaction.” These explanations can be distinguished by examining returns over short time intervals since systematic changes in fundamental valuation over intervals like a week should not occur in efficient markets. The evidence suggests that the “winners” and “losers” one week experience sizeable return reversals the next week in a way that reflects apparent arbitrage profits which persist after corrections for bid-ask spreads and plausible transactions costs. This probably reflects inefficiency in the market for liquidity around large price changes.
Article
Full-text available
Two medical applications of linear programming are described in this paper. Specifically, linearprogramming -based machine learning techniques are used to increase the accuracy and objectivity of breast cancer diagnosis and prognosis. The first application to breast cancer diagnosis utilizes characteristics of individual cells, obtained from a minimally invasive fine needle aspirate, to discriminate benign from malignant breast lumps. This allows an accurate diagnosis without the need for a surgical biopsy. The diagnostic system in current operation at University of Wisconsin Hospitals was trained on samples from 569 patients and has had 100% chronological correctness in diagnosing 131 subsequent patients. The second application, recently put into clinical practice, is a method that constructs a surface that predicts when breast cancer is likely to recur in patients that have had their cancers excised. This gives the physician and the patient better information with which to plan treat...
Chapter
In the history of research of the learning problem one can extract four periods that can be characterized by four bright events: (i) Constructing the first learning machines, (ii) constructing the fundamentals of the theory, (iii) constructing neural networks, (iv) constructing the alternatives to neural networks.
Article
It is widely agreed that asset allocation accounts for a large part of the variability in the return on a typical investor's portfolio. This is especially true if the overall portfolio is invested in multiple funds, each including a number of securities. Asset allocation is generally defined as the allocation of an investor's portfolio among a number of "major " asset classes. Clearly such a generalization cannot be made operational without defining such classes. Once a set of asset classes has been defined, it is important to determine the exposures of each component of an investor's overall portfolio to movements in their returns. Such information can be aggregated to determine the investor's overall effective asset mix. If it does not conform to the desired mix, appropriate alterations can then be made. Once a procedure for measuring exposures to variations in returns of major asset classes is in place, it is possible to determine how effectively individual fund managers have performed their functions and the extent (if any) to which value has been added through active management. Finally, the effectiveness of the investor's overall asset allocation can be compared with that of one or more benchmark asset mixes. An effective way to accomplish all these tasks is to use an asset class factor model. After describing the characteristics of such a model, we illustrate applications of a model with twelve asset classes to analyze the performance of a set of open-end mutual funds between 1985 and 1989. ASSET CLASS FACTOR MODELS Factor models are common in investment analysis. Equation (1) is a generic representation:
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.
Article
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.
Article
This paper introduces the concept of statistical arbitrage, a long horizon trading opportunity that generates a riskless profit and is designed to exploit persistent anomalies. Statistical arbitrage circumvents the joint hypothesis dilemma of traditional market efficiency tests because its definition is independent of any equilibrium model and its existence is incompatible with market efficiency. We provide a methodology to test for statistical arbitrage and then empirically investigate whether momentum and value trading strategies constitute statistical arbitrage opportunities. Despite adjusting for transaction costs, the influence of small stocks, margin requirements, liquidity buffers for the marking-to-market of short-sales, and higher borrowing rates, we find evidence that these strategies generate statistical arbitrage.
Article
This book is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. The book also introduces Bayesian analysis of learning and relates SVMs to Gaussian Processes and other kernel based learning methods. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc. Their first introduction in the early 1990s lead to a recent explosion of applications and deepening theoretical analysis, that has now established Support Vector Machines along with neural networks as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and application of these techniques. The concepts are introduced gradually in accessible and self-contained stages, though in each stage the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally the book will equip the practitioner to apply the techniques and an associated web site will provide pointers to updated literature, new applications, and on-line software.
Article
Computational comparison is made between two feature selection approaches for finding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separating plane is generated by minimizing a weighted sum of distances of misclassified points to two parallel planes that bound the sets and which determine the separating plane midway between them. Furthermore, the number of dimensions of the space used to determine the plane is minimized. In the support vector machine approach [27, 7, 1, 10, 24, 28], in addition to minimizing the weighted sum of distances of misclassified points to the bounding planes, we also maximize the distance between the two bounding planes that generate the separating plane. Computational results show that feature suppression is an indirect consequence of the support vector machine approach when an appropriate norm is us...
Article
INTRODUCTION We consider the two point-sets A and B in the n-dimensional real space R n represented by the m Theta n matrix A and the k Theta n matrix B respectively. Our principal objective here is to formulate a single linear program with the following properties: (i) If the convex hulls of A and B are disjoint, a strictly separating plane is obtained. (ii) If the convex hulls of A and B intersect, a plane is obtained that minimizes some measure of misclassification points, for all possible cases. (iii) No extraneous constraints are imposed on the linear program that rule out any specific case from consideration. Most linear programming formulations 6,5,12,4 have property (i), however