Forecasting foreign exchange rates with artificial neural networks: A review
ABSTRACT Forecasting exchange rates is an important financial problem that is receiving increas-ing attention especially because of its difficulty and practical applications. Artificial neural networks (ANNs) have been widely used as a promising alternative approach for a forecasting task because of several distinguished features. Research efforts on ANNs for forecasting exchange rates are considerable. In this paper, we attempt to provide a survey of research in this area. Several design factors significantly impact the accuracy of neural network forecasts. These factors include the selection of input variables, prepar-ing data, and network architecture. There is no consensus about the factors. In different cases, various decisions have their own effectiveness. We also describe the integration of ANNs with other methods and report the comparison between performances of ANNs and those of other forecasting methods, and finding mixed results. Finally, the future research directions in this area are discussed.
- [show abstract] [hide abstract]
ABSTRACT: A multivariate time series model with time varying conditional variances and covariances, but constant conditional correlations is proposed. In a multivariate regression framework, the model is readily interpreted as an extension of the Seemingly Unrelated Regression (SUR) model allowing for heteroskedasticity. Parameterizing each of the conditional variances as a univariate Generalized Autoregressive Conditional Heteroskedastic (GARCH) process, the descriptive validity of the model is illustrated for a set of five nominal European U.S. dollar exchange rates following the inception of the European Monetary System (EMS). When compared to the pre- EMS free float period, the comovements between the currenciess are found to be significantly higher over the later period. Copyright 1990 by MIT Press.Review of Economics and Statistics 01/1990; 72(3):498-505. · 2.66 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Summary form only given. The authors introduced a neural network architecture, called ARTMAP, that autonomously learns to classify arbitrarily many, arbitrarily ordered vectors into recognition categories based on predictive success. This supervised learning system is built up from a pair of adaptive resonance theory modules (ART<sub>a </sub> and ART<sub>b</sub>) that are capable of self-organizing stable recognition categories in response to arbitrary sequences of input patterns. Tested on a benchmark machine learning database in both online and offline simulations, the ARTMAP system learns orders of magnitude more quickly, efficiently, and accurately than alternative algorithms, and achieves 100% accuracy after training on less than half of the input patterns in the databaseNeural Networks for Ocean Engineering, 1991., IEEE Conference on; 09/1991
- [show abstract] [hide abstract]
ABSTRACT: It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samplesIEEE Transactions on Neural Networks 10/1999; · 2.95 Impact Factor
International Journal of Information Technology & Decision Making
Vol. 3, No. 1 (2004) 145–165
c ? World Scientific Publishing Company
FORECASTING FOREIGN EXCHANGE RATES
WITH ARTIFICIAL NEURAL NETWORKS: A REVIEW
Institute of Systems Science, Academy of Mathematics and Systems Sciences
Chinese Academy of Sciences, Beijing 100080, People’s Republic of China
School of Knowledge Science, Japan Advanced Institute of Science and Technology
1-1, Asahidai, Ishikawa 923-1292, Japan
K. K. LAI
Department of Management Sciences, City University of Hong Kong
Tat Chee Avenue, Kowloon, Hong Kong
School of Knowledge Science, Japan Advanced Institute of Science and Technology
1-1, Asahidai, Ishikawa 923-1292, Japan
Institute of Systems Science, Academy of Mathematics and Systems Sciences
Chinese Academy of Sciences, Beijing 100080, People’s Republic of China
Forecasting exchange rates is an important financial problem that is receiving increas-
ing attention especially because of its difficulty and practical applications. Artificial
neural networks (ANNs) have been widely used as a promising alternative approach for
a forecasting task because of several distinguished features. Research efforts on ANNs
for forecasting exchange rates are considerable. In this paper, we attempt to provide a
survey of research in this area. Several design factors significantly impact the accuracy of
neural network forecasts. These factors include the selection of input variables, prepar-
ing data, and network architecture. There is no consensus about the factors. In different
cases, various decisions have their own effectiveness. We also describe the integration of
ANNs with other methods and report the comparison between performances of ANNs
and those of other forecasting methods, and finding mixed results. Finally, the future
research directions in this area are discussed.
Keywords: Artificial neural networks; exchange rate; forecasting.
∗Corresponding author. This author is also with School of Business, Hunan University
146W. Huang et al.
The foreign exchange market is the largest and most liquid of the financial markets,
with an estimated $1 trillion traded every day. Exchange rates are amongst the
most important economic indices in the international monetary markets. For large
multinational firms, which conduct substantial currency transfers in the course of
business, being able to accurately forecast exchange rate movements can result in
substantial improvement in the firm’s overall profitability.
Exchange rates are affected by many highly correlated economic, political
and even psychological factors. These factors interact in a very complex fashion.
Exchange rate series exhibit high volatility, complexity and noise that result from
an elusive market mechanism generating daily observations.49Evidence has clearly
shown that while there is little linear dependence, the null hypothesis of indepen-
dence can be strongly rejected, demonstrating the existence of non-linearities in
Much research effort has been devoted to exploring the nonlinearity of exchange
rate data and to developing specific nonlinear models to improve exchange rate fore-
casting. Parametric nonlinear models such as the autoregressive random variance
(ARV) model,44autoregressive conditional heteroscedasticity (ARCH),16general
autoregressive conditional heteroskedasticity (GARCH),1chaotic dynamic31and
self-exciting threshold autoregressive4models have been proposed and applied to
foreign exchange rate forecasting. While these models may be good for a particu-
lar situation, they perform poorly for other applications. The pre-specification of
the model form restricts the usefulness of these parametric nonlinear models since
many other possible nonlinear patterns can be considered. One particular nonlin-
ear specification will not be general enough to capture all the nonlinearities in the
data. Some nonparametric methods have also been proposed to forecast exchange
rates.7,28,29However, nonparametric methods investigated in these studies are still
unable to improve upon a simple random walk model in out-of-sample predictions
of exchange rates.
There has been growing interest in the adoption of the state-of-the-art artifi-
cial intelligence technologies to solve the problem. One stream of these advanced
techniques focuses on the use of artificial neural networks (ANNs) to analyze
the historical data and provide predictions on future movements in the foreign
exchange market. An ANN is a system loosely modeled on the human brain,
which detect the underlying functional relationships within a set of data and
perform tasks such as pattern recognition, classification, evaluation, modeling, pre-
diction and control. ANNs are particularly well suited to finding accurate solu-
tions in an environment characterized by complex, noisy, irrelevant or partial
information. Several distinguishing features of ANNs make them valuable and
attractive in forecasting. First, as opposed to the traditional model-based meth-
ods, ANNs are data-driven self-adaptive methods in that there are few a priori
assumptions about the models for problems under study. Second, ANNs can
Forecasting Foreign Exchange Rates with ANN 147
generalize. Third, ANNs are universal functional approximators. Finally, ANNs
The idea of using ANNs for forecasting exchange rates is not new. Weigend
et al.53find that neural networks are better than random walk models in pre-
dicting the DEM/USD exchange rate. Refense et al.37apply a multi-layer per-
ceptron network to predict the exchange rate between USD/DEM, and to study
the convergence issue related to network architecture. Refense36develops a con-
structive learning algorithm to find the best neural network configuration in fore-
casting DEM/USD. Podding33studies the problem of predicting the trend of the
USD/DEM, and compares results to those obtained through regression analysis.
Pi32proposes a test for dependence among exchange rates. Shin41applies an
ANN model and moving average trading rules to investigate return predictabil-
ity of exchange rates. Zhang and Hutchinson62report the experience of forecasting
the tick-by-tick CHF/USD. Kuan and Liu24use both feed-forward and recurrent
neural networks to forecast GBP, CAD, DEM, JPY, CHF against USD. Wu55com-
pares neural networks with ARIMA models in forecasting Taiwan/USD exchange
rates. Hann and Steurer15mark comparisons between the neural network and linear
model in USD/DEM forecasting. Episcopos and Davis10investigate the problem of
predicting daily returns based on five Canadian exchange rates using ANNs and
a heteroskedastic model, EGARCH. Tenti48proposes the use of recurrent neural
networks in order to forecast exchange rates. Other earlier examples using ANN in
exchange rates application include Zhang61and Yao et al.56
Considerable research effort has gone into ANNs for forecasting exchange rates.
In this paper, we attempt to provide a survey of research in this area. Forecasting
exchange rates using ANNs is a process that can be divided into several steps.
Our goal in this paper is to find out consensus and disagreements in each step.
Hence, the comparisons of various methods used by different researchers go along
the whole forecasting process. For the consensus areas, guidelines are summarized.
With the disagreements, we analyze the reasons and point out the advantages and
disadvantages of various methods.
The paper is organized as follows. Section 2 covers input selection. Sec. 3 deals
with preparing data. In Sec. 4, we give a brief presentation of ANN architecture.
Section 5 describes the integration of ANNs with other methods. The comparison
between performances of ANNs and those of other forecasting methods is reported
in Sec. 6. Finally, conclusions and directions for future research are discussed in
2. Input Selection
There are two kinds of inputs — fundamental inputs and technical inputs.
Fundamental inputs include consumer price index, foreign reserve, GDP, export and
import volume, interest rates, etc. Technical inputs include the delayed time series
data, moving average, relative strength index, etc. Besides the above two kinds
148W. Huang et al.
of inputs, individual forecast results could be used as inputs when using ANNs
as combined forecasting tools. In order to provide improved volatility forecasts,
Hu and Tsoukalas17combine GARCH, EGARCH, IGARCH and MAV volatility
forecast through ANNs. A preliminary effort to maximize the output performance
is conducted by ensuring adequate domain knowledge representation from input
While Walczak et al.52claim that multivariate inputs are necessary, most neural
network inputs for exchange rate prediction are univariate. Univariate inputs utilize
data directly from time series being forecast, while multivariate inputs utilize infor-
mation from outside the time series in addition to the time series itself. Univariate
inputs rely on the predictive capabilities of the time series itself, corresponding to
a technical analysis as opposed to a fundamental analysis. For a univariate time
series forecasting problem, the network inputs are the past, lagged observations of
the data series and the output is the future value. Each input pattern is composed of
a moving window of a fixed length along the series. In this sense, the feed-forward
network used for time series forecasting is a general autoregressive model. The
question is how many lag periods should be included in predicting the future. Some
authors designed experiments to help selecting the number of input nodes while oth-
ers adopted some intuitive of empirical ideas. Mixed results are often reported in
the literatures. The lack of systematic approaches to neural network model building
is probably the primary cause of inconsistencies in the reported findings.
Ideally, we desire a small number of lag periods that can unveil the unique fea-
tures embedded in the data. The inclusion of excessive periods will adversely affect
the training time of the network, and the algorithm will likely be trapped in local
optimal solutions. On the other hand, if the lag is smaller than required, forecast-
ing accuracy will be jeopardized because the search is restricted to a subspace. Too
few or too many lag periods affect either the learning or prediction capability of
the network. It is desirable to reduce the number of input nodes to an absolute
minimum of essential nodes.
Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
as well as several extensions have been used as information-based in-sample model
selection criteria in selecting neural networks for foreign exchange rate time series
forecasting.34However, the in-sample model selection criteria are not able to provide
a reliable guide to out-of-sample performance and there is no apparent connection
between in-sample model fit and out-of-sample forecasting performance.
Huang et al.18propose a general approach called Autocorrelation Criterion (AC)
to determine lag structures in the applications of ANNs to univariate time series
forecasting. They apply the approach to the determination of input variables
for foreign exchange rate forecasting and conduct comparisons between AC and
information-based in sample model selection criterion. Experiment results show that
AC outperforms information-based in sample model selection criterion in terms of
Forecasting Foreign Exchange Rates with ANN149
We suggest practitioners to employ Autocorrelation Criterion in the case of
univariate input. It does not require any assumptions, completely independent of
particular class of model. The selection of input variables is data-driven, making full
uses of information among sample observations even if the underlying relationships
are unknown or hard to describe. Thus, it is well suited for time series problems
whose solutions require knowledge that is difficult to specify but for which there
are enough data or observations. It nevertheless provides a practical way to solve
input selection for neural networks in time series forecasting.
Multivariate inputs are based on economics and finance theory. El Shazly et al.8
use fundamental inputs including the one month Eurorate on US dollar deposit, the
one month Eurorate on the foreign currency deposit, the spot exchange rate, and
the one month forward premium on the foreign currency. El Shazly et al.9use inputs
including the 90-day Euro deposit rate on the US dollar (INT US); the 90-day Euro
deposit rate on the British pound (INTBP), German mark (INTDM), Japanese
yen (INTJY), and the Swiss franc (INTSF); the spot exchange rate of the foreign
currency: SBP, SDM, SJY, and SSF expressed in direct form; the 90-day forward
exchange rate on the foreign currency: FDBP, FDDM, FDJY, FDSF; and the 90-day
future exchange rate on the foreign currency: FTBP, FTDM, FTJY, FTSF. The
above input variables selection comes from Interest Rate Parity, a principle by which
forward exchange rates reflect relative interest rates on default risk-free instruments
denominated in alternative currencies. Currencies of countries with high interest
rates are expected by the market to depreciate over time, and currencies of countries
with low interest rates are expected to appreciate over time.
Leung et al.25use the MUIP relationship39as the theoretical basis for multivari-
ate specification. The MUIP relationship can be modeled and written as follows:
et= α0+ α1(r∗
t− rt) + α2(π∗
t− π) + α3(p∗
t− pt) + α4(cat/nyt)
t) + µt
where e is the natural logarithm of the exchange rate, defined as the foreign cur-
rency price of domestic currency. r,π,p and (ca/ny) represent the logarithm of the
nominal short-term interest rate, expected price inflation rate, the logarithm of
the price level, and the ratio of current account to nominal GDP for the domestic
economy respectively. Asterisks denote the corresponding foreign variables. µ is the
error term. The variables (ca/ny) and (ca∗/ny∗) are proxies for the risk premium.
Tenti48uses inputs including the compound returns of the last n periods (where
n = 1,2,3,5,8), the running standard deviation of the k last periods (where k =
13,21,34), and technical indicators such as the average directional movement index
(ADX), trend movement index TMI), rate of change (ROC), and Ehlers leading
indicator (ELI). Lisi and Schiavo26use both the past observation of the series
itself and those of an “auxiliary” variable chosen among the remaining series. For
example, the lagged FRF/USD, GBP/USD are used to predict future FRF/USD.
According to Walczak and Cerpa’s51suggestions, multivariate inputs can be
determined through the following steps. Firstly, perform standard knowledge
150W. Huang et al.
acquisition. Get as more explanatory variables to foreign exchange rates as pos-
sible from economics and finance theory. The primary purpose of the knowledge
acquisition phase is to guarantee that the input variable set is not under-specified,
providing all relevant domain criteria to the ANNs. Once a base set of input vari-
ables defined through knowledge acquisition, the set can be pruned to eliminate
variables that contribute noise to the ANNs and consequently reduce the ANNs
generalization performance. Smith43claims that ANNs input variables need to be
predictive, but should not be correlated. Correlated variables degrade ANNs perfor-
mance by interacting with each other as well as other elements to produce a biased
effect. A first pass filter to help identify noise variables is to calculate the correla-
tion of pairs of variables (Pearson correlation matrix). Alternatively, a chi-square
test may be used for categorical variables. If two variables have a high correlation,
then one of these two variables may be removed from the set of variables with-
out adversely affecting the ANNs performance. Additional statistical techniques
may be applied, depending on the distribution properties of the data set. Stepwise
regression (multiple or logistic) and factor analysis provide viable tools for evalu-
ating the predictive value of input variables and may serve as a secondary filter to
the Pearson correlation matrix.
Multivariate input has advantages in long term forecasting, unveiling the move-
ment trend of foreign exchange rates. But it needs more data and time. Some
explanatory variables are not available in time requirement. Univariate input has
not such problems. The practitioners can benefit from a net reduction in the devel-
opment costs, since less data is required. However, it lacks of economic explanations,
which weakens forecasting credibility.
3. Preparing Data
Due to the fact that only relatively little preliminary knowledge is required to
train artificial neural networks and on account of the black box character, data is
often presented to the networks without any further processing steps being taken.
However, the degree of care invested in preparing the data is of decisive importance
to the networks learning speed and the quality of approximation it can attain. Every
hour invested in preparing the data may save days in training the networks.
The first questions to be considered here are of a very general nature:
(1) Is sufficient data available, and does this data contain the correct information?
(2) Does the available data cover the range of the variables concerned as completely
(3) Are there borderline cases that are not covered by the data?
(4) Does the data contain irrelevant information?
(5) Are there transformations or combinations of variables (e.g. ratios) that
describe the problem more effectively than the individual variables themselves?
Once all these points have been clarified, the data needs to be transformed into
an appropriate form for the networks. Various normalization methods are generally
Forecasting Foreign Exchange Rates with ANN151
employed to this end. Tenti48normalizes inputs to zero mean and two standard
deviations. In Hu et al.’s17study, all inputs to ANNs are linearly normalized to
[0,1]. El Shazly et al.9suggest that data should be manipulated and converted to
the required format for further processing. In Lisi and Schiavo’s26study, the log-
differenced data are scaled linearly in the range of 0.2–0.8 in order to adapt them to
the output range of the sigmoid activation function. Qi and Zhang34apply natural
logarithm transformation to raw data to stabilize the series. An ADF test shows
that the transformed time series contains a unit root, thus the first order difference
There is no consensus on whether data normalization should be used. For exam-
ple, it is still unclear that whether there is a need to normalize the input because
the arc weights can undo the scaling. Shanker et al.40investigate the effectiveness of
linear and statistical normalization methods for classification problems. They find
that data normalization methods do not necessarily lead to better performance
particularly when the networks and sample size are large. El Shazly et al.8apply
normalization and transformation to initial runs and then discard them. They find
that although the difference data set speeds the training time by reducing the noise
during training, the networks when tested yield poor forecasts. Based on the objec-
tive of improving the testing performance rather than speeding up training time,
they decide to use raw data during training. Zhang and Hu59find no significant
difference between using normalized and original data, based on their experience
with the exchange rate data. Hence, raw data are used in that study.
Although normalization of the data is not compulsory, it is sometimes unavoid-
able. If for example, a function is valid only for a limited range, e.g. the sigmoid
function (0.0–1.0) or the tanh function (−1.0–1.0), the network will be unable to
generate any output values outside of this range. The target output data for the
training and test phases must therefore be normalized.
In principle, it is not absolutely necessary to normalize the input data, as
the networks input layer is assigned a linear function. However, it is neverthe-
less inadvisable not to normalize data when using multivariate inputs. As a result
of normalization, all variables acquire the same significance for the learning pro-
cess. If normalization is not carried out, variables with greater values will be given
Generally, a data set is divided into two, the training set and the test set. The
training set is used for ANNs model development and the test set is used to evaluate
the forecasting ability. Sometimes a third set, called the validation set, is used to
avoid the overfitting problem or to determine the stopping point in the training
There is no general solution to splitting the training set and test set. The Brain-
maker software randomly selects 10% of the facts from the data set and uses them
for testing. Yao et al.57suggest that historical data are divided into three portions:
training, validation and testing sets. The training set contains 70% of the collected
data, while the validation and the testing sets contain 20% and 10% respectively.
152W. Huang et al.
The division is based on a rule of thumb derived from the authors’ experience.
Other researches just give the division directly, not touching on the reasons for it.
Sample size is another factor that can affect artificial neural networks forecasting
ability. Neural networks researchers have used various sizes of training sets, ranging
from one year to sixteen years.21,36,46,48,52,58Large samples are often claimed to
be optimal in training neural networks due to the large set of parameters involved
in the network. To test if there is a significant difference between large and small
training samples in modeling and forecasting exchange rates, Zhang and Hu58use
two training sample sizes. The large sample consists of 887 observations from 1976
to 1992, and the small one includes 261 data points from 1998 to 1992. Their result
is that the large sample outperforms the smaller sample. Most of the researchers
typically use all of the data in building neural networks forecasting model once they
have obtained their training data, with no attempt at comparing data quantity
effects on the quality of the produced forecasting models.
However, Kang,22in a comprehensive study of neural network time series fore-
casting, finds that neural networks forecasting models do not necessarily require
a large data set to perform well. Walczak50examines the effect of different sizes
of training sample sets on forecasting exchange rates. His research results indicate
that for financial time series, two years of training data is frequently all that is
required to produce optimal forecasting accuracy. He claims that given an appro-
priate amount of historical knowledge, neural networks can forecast future exchange
rates with 60% accuracy, while neural networks trained on a larger training set have
a worse forecasting performance. In addition to high-quality forecasts, the reduced
training set sizes reduce development cost and time.
Huang et al.19propose to determine the optimal quantity of training data by
using change-point detection. The behavior of exchange rates is evolving over time.
Therefore, we can conjecture that the movement of exchange rates has a series
of change points, which divide data into several homogeneous groups that take
heterogeneous characteristics from each other.
4. Architectures of ANNs
Three classes of ANNs architectures have been employed for forecasting foreign
exchange rates. In this section, we give a brief presentation and conduct some
In feedforward ANNs, the connections between units do not form cycles. Feedfor-
ward ANNs usually produce a response to an input quickly.
4.1.1. Multi-layer perceptrons (MLP)
MLP38is perhaps the most popular network architecture in use, which is rela-
tively easy to implement. An MLP is typically composed of several layers of nodes
Forecasting Foreign Exchange Rates with ANN153
Fig. 1. Examples of multi-layer perceptron neural network architectures.
(see Fig. 1). The network thus has a simple interpretation as a form of input-
output model, with the weights and thresholds (biases) the free parameters of the
model. Although it has been shown theoretically that the MLP has a universal
functional approximating capability and can approximate any nonlinear function
with arbitrary accuracy, no universal guideline exists in choosing the appropriate
model structure for practical applications. Thus, a trial-and-errorapproach or cross-
validation experiment is often adopted to help find the best model. Typically a large
number of neural network architectures are considered. The one with the best per-
formance in the validation set is chosen as the winner, and the others are discarded.
4.1.2. Radial basis function networks (RBFNs)
RBFNs30have static Gaussian function as the nonlinearity for the hidden layer
processing elements. The Gaussian function responds only to a small region of the
input space where the Gaussian is centered. The key to successful implementation
of these networks is to find suitable centers for the Gaussian functions. This can
be done with supervised learning, but an unsupervised approach usually produces
The advantage of the radial basis function network is that it finds the input to
output map using local approximators. Usually the supervised segment is simply a
linear combination of the approximators. Since linear combiners have few weights,
these networks train extremely fast and require fewer training samples.
154 W. Huang et al.
4.1.3. Learning vector quantization (LVQ)
LVQ12,23is a precursor of the well-known self-organizing maps (also called Kohonen
feature maps) and like them it can be seen as a special kind of artificial neural
network. A neural network for learning vector quantization consists of two layers:
an input layer and an output layer. It represents a set of reference vectors, the
coordinates of which are the weights of the connections leading from the input
neurons to an output neuron. Hence, one may also say that each output neuron
corresponds to one reference vector. This kind of ANNs architecture can only be
used for classification. Hence, we cannot employ it in forecasting foreign exchange
4.1.4. General regression neural networks (GRNNs)
GRNNs45are memory-based feed-forward networks based on the estimation of
probability density functions. GRNNs featuring fast training times, can model non-
linear functions, and have been shown to perform well in noisy environments given
enough data. The GRNN topology consists of four layers: the input layer, pattern
layer, summation layer and output layer. Each layer of processing units is assigned
a specific computational function when nonlinear regression is performed. The only
adjustable parameter in a GRNN is the smoothing factor for the kernel function.
The optimization of the smoothing factor is critical to the GRNN’s performance and
is usually found through iterative adjustments and the cross-validation procedure.
The advantages of GRNN include
(1) Fast training times.
(2) Can handle both linear and non-linear data.
(3) Adding new samples to the training set does not require re-calibrating the
(4) Only one adjustable parameter thereby making overtraining less likely.
The disadvantages include:
(1) Has trouble with irrelevant inputs (i.e. suffers from the dimensionality curse).
(2) No intuitive method for selecting the optimal smoothing parameter.
(3) Requires many training samples to adequately span the variation in the data.
(4) Requires that all the training samples be stored for future use (i.e. prediction).
In feedback ANNs, there are cycles in the connections. Each time an input is pre-
sented, ANNs must iterate for a potentially long time before it produces a response.
Feedback ANNs are usually more difficult to train than feedforward ANNs.
Forecasting Foreign Exchange Rates with ANN 155
4.2.1. Recurrent neural networks (RNNs)
Recurrent neural networks (RNNs), in which the input layer’s activity patterns
pass through the network more than once before generating a new output pattern,
can learn extremely complex temporal patterns. Recurrent architecture has been
proved to be superior to the windowing technique of overlapping snapshots of data,
which is used with standard back-propagation. In fact, by introducing time-lagged
model components, RNNs may respond to the same input pattern in a different
way at different times, depending on the input sequence. The main disadvantage
of RNNs is that they require substantially more connections, and more memory in
simulation, than standard back-propagation networks. RNNs can yield good result
because of the rough repetition of similar patterns present in exchange rate time
series. These regular but subtle sequences can provide a beneficial forecast ability.
4.3.1. Fuzzy ARTMAP network
A fuzzy ARTMAP network3is a fuzzy ART2network that adds a single output
layer to generate an error signal to the fuzzy ART network that is made up of the
input, complement and category layers. The addition of the output layer for the
error signal transforms the network from an unsupervised network to a supervised
network where the network learns from examples in which the real category is
Modular ANNs20essentially make use of multiple individual back-propagation net-
works (BPNs) that compete to learn different aspects of the problem. The networks
use an expert gating mechanism to choose which of the BPNs (called a local expert)
does best on a particular input observation, essentially assigning different regions
of the data space to different local experts. The general idea is that the error at
each local expert is weighted by its posterior probability (obtained as training takes
place) that it was responsible for in the current output vector. The gating networks
learns by trying to match its prior probabilities to the posterior probabilities found
in each local expert.
MLP is used most frequently for exchange rate prediction, because it has an
inherent capability of arbitrary input-output mapping. However, other types of
ANNs are also used.
Tenti48perform tests with three variations of RNNs. The first architecture used
(RNN1) has one hidden and one recurrent layer. The output layer is fed back into
the hidden layer, by means of the recurrent layer, showing the resulting output of
the previous pattern. In the second version (RNN2), similar to that of Fransconi
et al.,13the hidden layer is fed back into itself through an extra layer of recurrent
nodes. In the third version (RNN3), patterns are processed from the input layer
156W. Huang et al.
through a recurrent layer of nodes, which holds the input layer’s contents as they
existed when previous patterns were trained, and then are fed back into input layer.
Leung et al.25examine GRNNs forecast ability and compare its performance
with a variety of forecasting techniques, including the multi-layered feed-forward
Davis et al.6present a variety of neural networks forecasting models applied to
Canadian–US exchange rate data. Networks such as back-propagation, modular,
radial basis functions, linear vector quantization, fuzzy ARTMAP and genetic rein-
forcement learning are examined. It is important to note that they predict direction
shifts on Canadian–US exchange rate data rather than absolute price. Different
types of classification networks have characteristics that may prove effective for
specific classification data.
The selection of ANNs architecture is an open problem. ANNs designers must
use the constraints of the training data set and development cost for determination.
We suggest practitioners to employ MLP which is relatively easy and costs less to
5. The Integration of ANNs with Other Methods
The desire to further enhance the performance of neural network prediction has
led to the development of hybrid systems that combine neural networks with other
methods. The integration of ANNs with other technologies, such as wavelet analysis,
genetic algorithm, or fuzzy logic can improve the applications of ANNs. Although
each technology has its own strengths and weaknesses, these technologies are com-
plementary. Weaknesses of one technology can be overcome by strengths of another
by achieving a systematic effect. Such an effect can create results that are more
efficient, productive, and effective than the sum of their parts.
Genetic algorithm (GA) is a class of probabilistic search techniques based on
biological evolution. Each point in the solution space is coded as a binary string
called a chromosome. For instance, the co-ordinate (10,5,3) is encoded as
1 0 1 0
0 1 0 1
0 0 1 1
When a new generation exists, each member is ranked according to its fitness.
From this, a new population must be created. Essentially this is a “survival of the
fittest solution”, and the members used for mating are chosen with a probability
proportional to their fitness.
A technique called crossover is employed to maximize retention of the good
points of the previous generation. This is analogous to biological mating in which a
child may be superior to both parents if it inherits good genes from both parents. In
the computing process, this is achieved by swapping corresponding bits in pairs of
chromosomes according to a given crossover rate; for instance, the last three bits of
one chromosome may be swapped with the last three bits of another chromosome.
Forecasting Foreign Exchange Rates with ANN157
If the population does not contain all of the traits needed to solve a problem, no
amount of crossover will work. As a result, a single bit is flipped very infrequently.
This is called mutation, and solves one of the problems of neural networks — that we
arrive at local minima. Mutation provides a way out by preventing a bit converging
on a single value throughout the entire population. Mutation must be kept to a
minimum to prevent loss of good chromosomes.
The inclusion of GA search techniques was undertaken for two reasons. The first
relates to the potential GA offer in terms of adaptiveness. The flexibility, robustness
and simplicity that GA offers render them very attractive in that respect. The
second reason stems from the difficulty in optimizing neural network applications.
By operating on entire populations of candid solutions in parallel, GA is much less
likely to get stuck at a local optimum.
Wavelet analysis is used to process information effectively at different scales.
It is very useful for feature detection from complex and chaotic time series. In
particular, the specific local properties of wavelets can be useful in describing the
signals with discontinuous or fractal structures in the financial market. It also allows
the removal of noise-dependent high frequencies, while conserving the signal bearing
high frequency terms. However, one of the most critical issues in the application of
the wavelet analysis is to choose the correct wavelet thresholding parameters.
El Shazly et al.9design a hybrid system combining neural networks with genetic
training to forecast the three-month spot exchange rate. Once the network is
trained, tested and identified as being “good”, a GA is applied to it in order to
optimize its performance. The process of genetic evolution works on the neuron con-
nection of a trained network by applying two procedures: mutation and crossover.
The application of hybrid systems seems to be well suited for the forecasting of
Shin et al.42propose an integrated thresholding design of the optimal or near-
optimal wavelet transformation (WT) by GA to represent a significant signal most
suitable in ANN models. The model is applied to forecast the Korean won/USD
returns one day ahead of time. In this study, the multi-scale signal representation of
ANNs is supported by a wavelet transform as the multi-signal decomposition tech-
nique to detect the features of significant patterns. A strategy is devised using WT
to construct a filter that is significantly matched to the frequency of the time series
within the combined model. The experimental results show the enhanced filtering
or signal multi-resolution power of wavelet analysis by GA in the performance of
the ANNs. This study also finds that the hybrid system of wavelet transformations
and ANNs by GA is much better than other ANNs that use other three-wavelet
thresholding algorithms (cross-validation, best level, and best basis) to increase
158W. Huang et al.
6. Performance Comparison with Other Forecasting Methods
There are inconsistent reports on the performance of ANNs for forecasting exchange
rates when compared with other forecasting methods. Table 1 summarizes the lit-
erature on the relative performance of ANNs.
Weigend et al.53find that neural networks are better than random walk
models (RW) in predicting the DEM/USD exchange rate. Wei et al.54claim
that ANNs’ forecasting performance is better than those of AR(p), ARMA(p,q),
ARIMA(p,d,q). Lisi et al.26make a comparison between ANNs and chaotic models
in forecasting exchange rate prediction. ANNs perform slightly better than chaotic
models, in term of NMSE; nevertheless, the two models are statistically equiva-
lent. Yao and Tan57show that irrespective of NMSE, gradient or profit, ANNs are
much better than traditional ARIMA model when forecasting the exchange rates
between USD and five other major currencies, AUD, CHF, DEM, GBP and JPY.
Leung et al.25point out that GRNNs generally outperform parametric multivariate
transfer functions and the random walk models.
Episcopos and Davis10suggest that neural networks are similar to EGARCH,
but superior to random walk models in terms of in-sample fit and out-of-sample
prediction performance. Hann and Steurer15compared neural network models with
linear monetary models in forecasting USD/DEM. Out-of-sample results show that,
for weekly data, neural networks are much better than linear models and na¨ ıve
predictions of a random walk model with regard to Theil’s U measure, the hit rate,
the annualized returns and the Sharp ratio. However, if monthly data are used,
neural networks do not show much improvement over linear models. Monthly data
usually contain more irregularities (seasonality, cyclicity, nonlinearity, noise).
Zhang and Hutchinson62find mixed results for neural networks in compari-
son with those from random walk models using different sections of the data set.
Kuan and Liu24examine the out-of-sample forecasting ability of neural networks on
five exchange rates against the USD, including GBP, CAD, DEM, JPY and CHF.
For the GBP and JPY, they demonstrate that neural networks have significant
market timing ability and/or achieve significantly lower out-of-sample RMSE than
the random walk model across three testing periods. For the other three exchange
rates, neural networks are not shown to be superior in forecasting performance.
Their results also show that different network models perform quite differently
in out-of-sample forecasting. Hu et al.17compare combining the performance of
ANNs with those of various forecasting methods. Using different performance mea-
surements and different data stages, they get different results. ANNs are not always
better than other forecasting tools. Zhang and Hu58find that neural networks pre-
dict much better than random walk model when using large training samples. Small
training samples will make ANNs fail to outperform the random walks for longer
forecast horizons. They suggest possible structural changes in exchange rate data.
Therefore, as more observations are available, they should be used to revise the fore-
casting neural networks models to better reflect change in the underlying pattern.
Forecasting Foreign Exchange Rates with ANN 159
Table 1. The relative performance of ANNs with traditional forecasting methods.
ANNs Type Traditional Forecasting Method
USD, DEM, FRF, JPY,
GBP against CAD
Similar to EGARCH;
Better than RW
Linear model, RW
Theil’s U measure, Hit
rate, the annualized returns and the Sharpratio
Better in weekly data;
Similar in monthly data
BEF/LUF, GBP, DKK,
NLG, FRF, GRD, IEP,
ITL, PTE, ESP, USD
Kuan and Liu24
GBP, CAD, DEM, JPY
and CHF against USD
Leung et al.25
GBP, JPY, CAD against
Lisi and Schiavo26
FRF, DEM, ITL, GBP
Chaotic model, RW
Wei and Jiang54
AR, ARMA, ARIMA
Weigend et al.53
Yao and Tan57
AUD, CHF, DEM, GBP,
JPY against USD
NMSE, Correctness of
Zhang and Hu58
RMSE, MAE, MAPE
MAE: mean absolute error.
RMSE: root mean square error.
NMSE: normalized mean square error.
MAPE: mean absolute percentage error.
ARV: average relative variance.