Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Trading rules performing well on a given data set seldom lead to promising out-of-sample results, a problem which is a consequence of the in-sample data snooping bias. Efforts to justify the selection of trading rules by assessing the out-of-sample performance will not really remedy this predicament either, because they are prone to be trapped in what is known as the out-of-sample data-snooping bias. Our approach to curb the data-snooping bias consists of constructing a framework for trading rule selection using a-priori robustness strategies, where robustness is gauged on the basis of time-series bootstrap and multi-objective criteria. This approach focuses thus on building robustness into the process of trading rule selection at an early stage, rather than on an ex-post assessment of trading rule fitness. Intra-day FX market data constitute the empirical basis of the proposed investigations. Trading rules are selected from a wide universe created by evolutionary computation tools. The authors show evidence of the benefit of this approach in terms of indirect forecasting accuracy when investing in FX markets.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Schmidbauer et al. [24] introduce a GE-based trading rule selection framework that considers robustness. To that end, they developed a multi-objective fitness test that considers both observed series and synthetic ones generated using bootstrap. ...
Article
The attainment of trading rules using Grammatical Evolution traditionally follows a static approach. A single rule is obtained and then used to generate investment recommendations over time. The main disadvantage of this approach is that it does not consider the need to adapt to the structural changes that are often associated with financial time series. We improve the canonical approach introducing an alternative that involves a dynamic selection mechanism that switches between an active rule and a candidate one optimized for the most recent market data available. The proposed solution seeks the flexibility required by structural changes while limiting the transaction costs commonly associated with constant model updates. The performance of the algorithm is compared with four alternatives: the standard static approach; a sliding window-based generation of trading rules that are used for a single time period, and two ensemble-based strategies. The experimental results, based on market data, show that the suggested approach beats the rest.
... Data snooping is a particular concern in powerful methodologies such as GP, due to the very large number of models that can be generated and tested against the same dataset during training. Although there have been studies addressing this issue (e.g., [2,67]), many studies applying GP in finance and economics do not adequately take data snooping issues into account. ...
Article
Full-text available
While the origins of genetic programming (GP) stretch back over 50 years, the field of GP was invigorated by John Koza’s popularisation of the methodology in the 1990s. A particular feature of the GP literature since then has been a strong interest in the application of GP to real-world problem domains. One application domain which has attracted significant attention is that of finance and economics, with several hundred papers from this subfield being listed in the GP bibliography. In this article we outline why finance and economics has been a popular application area for GP and briefly indicate the wide span of this work. However, despite this research effort there is relatively scant evidence of the usage of GP by the mainstream finance community in academia or industry. We speculate why this may be the case, describe what is needed to make this research more relevant from a finance perspective, and suggest some future directions for the application of GP in finance and economics.
Article
The literature on trading algorithms based on Grammatical Evolution commonly presents solutions that rely on static approaches. Given the prevalence of structural change in financial time series, that implies that the rules might have to be updated at predefined time intervals. We introduce an alternative solution based on an ensemble of models which are trained using a sliding window. The structure of the ensemble combines the flexibility required to adapt to structural changes with the need to control for the excessive transaction costs associated with over-trading. The performance of the algorithm is benchmarked against five different comparable strategies that include the traditional static approach, the generation of trading rules that are used for single time period and are subsequently discarded, and three alternatives based on ensembles with different voting schemes. The experimental results, based on market data, show that the suggested approach offers very competitive results against comparable solutions and highlight the importance of containing transaction costs.
Article
Evolutionary Computation is often used in the domain of automated discovery of trading rules. Within this area, both Genetic Programming and Grammatical Evolution offer solutions with similar structures that have two key advantages in common: they are both interpretable and flexible in terms of their structure. The core algorithms can be extended to use automatically defined functions or mechanisms aimed to promote parsimony. The number of references on this topic is ample, but most of the studies focus on a specific setup. This means that it is not clear which is the best alternative. This work intends to fill that gap in the literature presenting a comprehensive set of experiments using both techniques with similar variations, and measuring their sensitivity to an increase in population size and composition of the terminal set. The experimental work, based on three S&P 500 data sets, suggest that Grammatical Evolution generates strategies that are more profitable, more robust and simpler, especially when a parsimony control technique was applied. As for the use of automatically defined function, it improved the performance in some experiments, but the results were inconclusive.
Thesis
Full-text available
Financial investment is an important economic activity. The value of indexes like the Dow Jones Industrial Average (DJI), the Standard & Poor 500 (S&P500) or domestic stock market indexes are commonly used as a measure of a country’s level of development. Financial markets provide a comfortable method to generate profit from diverse industries and commercial activities. Nevertheless, investors should consider also uncertainty in stock prices, legal restrictions, and transaction costs when making decisions. Even when several works have been published about ways to deal with the difficulties described above, many investors continue relying only on their own experience to make decisions. The limitations of the current approaches and their complexity have caused investors to overlook their benefits. Therefore, they are in need of tools to help them make correct decisions in practical situations. Investors are continuously concerned with making the best possible decision. From the wide range of available methods, portfolios have the advantage of including the uncertainty of the decisions (i.e. risk) into the optimization process. Besides, they provide a set of optimal solutions and an explanation about how investors choose a portfolio according with their preferences. Utility functions are used to model this behavior. Nevertheless, the inclusion of new restrictions to the problem definition prevents the application of traditional solution methods. Moreover, the risk metric is restricted to the covariance matrix of the asset’s returns only. Finance theory has identified these drawbacks and proposed solutions based on a multi-period definition of the problem, where a time horizon is considered instead of a static definition of the market. Nevertheless, this work has identified the following limitations to multi-period portfolio optimization approaches: They are limited to optimization of the portfolio’s return from the last period of time only; they rely on theoretical utility functions to describe the investor’s preference; finally, the overlook the information provided by data innovations arriving during the time horizon. This work assumes this information is useful to make better investment decisions. The review indicated the multi-period definition of the problem is developed using dynamic programming, which allow the inclusion of transaction costs and other state-dependent restrictions to it. Nevertheless, its solution has proved to be a difficult task. Multi-period theory references are mainly concerned with finding closed-form solutions to the problem for a given combinations of dynamic restrictions, risk metrics and utility functions. Definition of sub-problems is a common solution technique. On the other hand, evolutionary algorithms ix have been mainly applied to solve static portfolio optimization problems. Round-lots and compulsory assets are some examples. The conclusion was the application of evolutionary algorithms to solve multi-period portfolio optimization problems has received limited attention in the literature. This work introduces an investment method based on multi-period portfolio theory implemented with evolutionary algorithms. A Monte-Carlo approach is proposed to handle dynamic restrictions without the complications of purely mathematical methods. Transactions costs, portfolio unbalance, and inflation are the ones considered. Moreover, an identification process of the particular investor’s preference is presented to avoid the use of theoretical utility models. Also, the method considers data innovations to evaluate the current state of the market to allow adaptive decisions. The solutions model is divided in two parts: A multi-objective stochastic optimization evolutionary algorithm to solve multi-period portfolio problems, and the Investment Strategies method which uses the information about the market state, investor’s preference, and portfolio performance to make decisions. The method has the advantage to include dynamic restrictions, which are usually not included in the optimization process of traditional methods. The most important restriction are transaction costs, because the profit obtained by trading can be severely decimated by them. Also, the method includes a procedure to identify the investor’s particular preference, therefore, it makes decisions closer to the investor’s expectations. The method is fully automatic, providing regular investors with a useful tool to find investment recommendations. Although, the method is to be further enhanced with the inclusion of static restrictions and trading execution capabilities to have a complete investing system. The proposed method was tested with real data from American and Mexican markets and was compared against buy-and-holds and single-period optimal portfolios, which are common methods used by investors. The experiments considered the following performance metrics: Maximum loss, total time to reach the investor’s goal, final portfolio’s return, number stop loss occurrences, expected return and risk, and the Sharpe’s ratio. Statistic analysis concluded the proposed method outperformed the others for the proposed metrics. The Investment Strategies method showed to have lower maximum losses and higher Sharpe’s ratios than the other methods. Besides, the results indicate Investment Strategies dominate other methods when expected return and risk are considered. A significant difference was found between the results form the American market and the ones from the Mexican market. Finally, differences were found in the results obtained with different risk metrics. The results concluded the American market was subject of higher risk than the Mexican market. The analysis of the results concluded good investment decisions come from a balance between transaction and following the trends of the market. Also, different information sources should be considered when making decisions. The method is subject to improvements. For example, other methods could be used instead of normal multi-variate distributions to simulate the returns. Also, dynamic investment strategies could be devised to adapt the behavior of the algorithm to the current market scenario.
Article
Full-text available
This article presents a review of the application of evolutionary computation methods to solving financial problems. Genetic algorithms, genetic programming, multi-objective evolutionary algorithms, learning classifier systems, co-evolutionary approaches, and estimation of distribution algorithms are the techniques considered. The novelty of our approach comes in three different manners: it covers time lapses not included in other review articles, it covers problems not considered by others, and the scope covered by past and new references is compared and analyzed. The results concluded the interest about methods and problems has changed through time. Although, genetic algorithms have remained the most popular approach in the literature. There are combinations of problems and solutions methods which are yet to be investigated.
Article
Full-text available
The maximum entropy bootstrap is an algorithm that creates an ensemble for time series inference. Stationarity is not required and the ensemble satisfies the ergodic theorem and the central limit theorem. The meboot R package implements such algorithm. This document introduces the procedure and illustrates its scope by means of several guided applications.
Article
Full-text available
This study examines the potential of a neural network (NN) model, whose inputs and structure are automatically selected by means of a genetic algorithm (GA), for the prediction of corporate failure using information drawn from financial statements. The results of this model are compared with those of a linear discriminant analysis (LDA) model. Data from a matched sample of 178 publicly quoted, failed and non-failed, US firms, drawn from the period 1991 to 2000 is used to train and test the models. The best evolved neural network correctly classified 86.7 (76.6)% of the firms in the training set, one (three) year(s) prior to failure, and 80.7 (66.0)% in the out-of-sample validation set. The LDA model correctly categorised 81.7 (75.0)% and 76.0 (64.7)% respectively. The results provide support for a hypothesis that corporate failure can be anticipated, and that a hybrid GA/NN model can outperform an LDA model in this domain. Copyright Springer-Verlag Berlin/Heidelberg 2004
Article
Full-text available
The genetic programming (GP) paradigm is a functional approach to inductively forming programs. The use of natural selection based on a fitness function for reproduction of the program population has allowed many problems to be solved that require a non-fixed representation. Attempts to extend GP have focussed on typing the language to restrict crossover and to ensure legal programs are always created. We describe the use of a context free grammar to define the structure of the initial language and to direct crossover and mutation operators. The use of a grammar to specify structure in the hypothesis language allows a clear statement of inductive bias and control over typing. Modifying the grammar as the evolution proceeds is used as an example of learnt bias. This technique leads to declarative approaches to evolutionary learning, and allows fields such as incremental learning to be incorporated under the same paradigm. 1 Introduction The Genetic Programming par...
Article
We report evidence on the profitability and statistical significance among 2,127 technical trading rules. The best rules are found to be significantly profitable based on standard tests. We then employ White's (2000) Reality Check to evaluate these rules and find that data-snooping biases do not change the basic conclusions for the full sample. A sub-sample analysis indicates that the data-snooping problem is more serious in the second half of the sample. Profitability becomes much weaker in the more recent period, suggesting that the foreign exchange market becomes more efficient over time. Evidence from cross exchange rates confirms the basic findings.
Article
Grammatical Evolution (GE) is a novel, data-driven, model-induction tool, inspired by the biological gene-to-protein mapping process. This study provides an introduction to GE, and applies the methodology in an attempt to uncover useful technical trading rules which can be used to trade foreign exchange markets. In this study, each of the evolved rules (programs) represents a market trading system. The form of these programs is not specified ex-ante, but emerges by means of an evolutionary process. Daily US-DM, US-Stg and US-Yen exchange rates for the period 1992 to 1997 are used to train and test the model. The findings suggest that the developed rules earn positive returns in hold-out sample test periods, after allowing for trading and slippage costs. This suggests potential for future research to determine whether further refinement of the methodology adopted in this study could improve the returns earned by the developed rules. It is also noted that this novel methodology has general utility for rule-induction, and data mining applications. Copyright Springer-Verlag Berlin/Heidelberg 2004
Article
Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the method yielding the results. This problem is practically unavoidable in the analysis of time-series data, as typically only a single history measuring a given phenomenon of interest is available for analysis. It is widely acknowledged by empirical researchers that data snooping is a dangerous practice to be avoided, but in fact it is endemic. The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation. Our purpose here is to provide such methods by specifying a straightforward procedure for testing the null hypothesis that the best model encountered in a specification search has no predictive superiority over a given benchmark model. This permits data snooping to be undertaken with some degree of confidence that one will not mistake results that could have been generated by chance for genuinely good results.
Article
This paper tests two of the simplest and most popular trading rules--moving average and trading range break--by utilizing the Dow Jones Index from 1897 to 1986. Standard statistical analysis is extended through the use of bootstrap techniques. Overall, their results provide strong support for the technical strategies. The returns obtained from these strategies are not consistent with four popular null models: the random walk, the AR(1), the GARCH-M, and the Exponential GARCH. Buy signals consistently generate higher returns than sell signals, and further, the returns following buy signals are less volatile than returns following sell signals. Moreover, returns following sell signals are negative, which is not easily explained by any of the currently existing equilibrium models. Copyright 1992 by American Finance Association.
Article
This chapter describes Grammatical Evolution (GE) in detail (Ryan et al., 1998; O’Neill and Ryan, 2001; O’Neill, 2001). We show that it is an evolutionary algorithm (EA) that can evolve complete programs in an arbitrary language using a variable-length binary string. The binary genome determines which production rules in a Backus Naur Form (BNF) grammar definition are used in a genotype-to-phenotype mapping process to a program. GE is set up such that the evolutionary algorithm is independent of the output programs by virtue of the genotype-phenotype mapping, allowing GE to take advantage of advances in EA research. The BNF grammar, like the EA, is a plug-in component of the system that determines the syntax and language of the output code, hence, it is possible to evolve programs in an arbitrary language.
Article
Grammatical Evolution (GE) is a grammar based GA to generate computer programs which has been shown to be comparable with GP when applied to a diverse array of problems. GE has the distinction that its input is a BNF, which permits it to generate programs in any language, of arbitrary complexity, including loops, multiple line functions etc. Part of the power of GE is that it is closer to natural DNA than GP, and thus can benefit from natural phenomena such as a separation of search and solution spaces through a genotype to phenotype mapping, and a genetic code degeneracy which can give rise to silent mutations (Mutations that have no effect on the phenotype). We have previously shown how runs of GE are competitive with GP, and in this paper we analyse characteristics such as genotypic diversity, and individual genotypic length, in an attempt to shed light on the power of the system. Results indicate that GE can use certain features of the system to its benefit ...
Article
Numerous studies in the finance literature have investigated technical analysis to determine its validity as an investment tool. Several of these studies conclude that technical analysis does have merit, however, it is noted that the effects of data-snooping are not fully accounted for. In this paper we utilize White's Reality Check bootstrap methodology (White (1997)) to evaluate simple technical trading rules while quantifying the data-snooping bias and fully adjusting for its effect in the context of the full universe from which the trading rules were drawn. Hence, for the first time, the paper presents a means of calculating a comprehensive test of performance across all trading rules. In particular, we consider the study of Brock, Lakonishok, and LeBaron (1992), expand their universe of 26 trading rules, apply the rules to 100 years of daily data on the Dow Jones Industrial Average, and determine the effects of data-snooping. During the sample period inspected by Brock, Lakonishok and LeBaron, we find that the best technical trading rule is capable of generating superior performance even after accounting for data- snooping. However, we also find that the best technical trading rule does not provide superior performance when used to trade in the subsequent 10-year post-sample period.
On the state of the efficient market hypothesis in financial economics
  • R Merton
Merton R, On the state of the efficient market hypothesis in financial economics, Macroeconomics and Finance: Essays in Honor of Franco Modigliani (ed. by Dornbusch R, Fischer S, and Bossons J), MIT Press, Cambridge, Mass., 1987.
Programming for Research: The evolutionary algorithm in EpochX
  • Epochx - Genetic
EpochX -Genetic Programming for Research: The evolutionary algorithm in EpochX. URL http://www.epochx.org/guide-algorithm.php. Accessed July 2011.
R: A Language and Environment for Statistical Computing, R Foun-dation for Statistical Computing
  • R Development
  • Core Team
R Development Core Team, R: A Language and Environment for Statistical Computing, R Foun-dation for Statistical Computing, Vienna, Austria, 2011.
Grammatically-based genetic programming
  • P Wigham
  • P A Wigham
Wigham P A, Grammatically-based genetic programming, Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications (ed. by Rosca J P), Rochester, New York, 1995: 33-41.