# Forecasting foreign exchange rates with artificial neural networks: A review

**ABSTRACT** Forecasting exchange rates is an important financial problem that is receiving increas-ing attention especially because of its difficulty and practical applications. Artificial neural networks (ANNs) have been widely used as a promising alternative approach for a forecasting task because of several distinguished features. Research efforts on ANNs for forecasting exchange rates are considerable. In this paper, we attempt to provide a survey of research in this area. Several design factors significantly impact the accuracy of neural network forecasts. These factors include the selection of input variables, prepar-ing data, and network architecture. There is no consensus about the factors. In different cases, various decisions have their own effectiveness. We also describe the integration of ANNs with other methods and report the comparison between performances of ANNs and those of other forecasting methods, and finding mixed results. Finally, the future research directions in this area are discussed.

**6**Bookmarks

**·**

**503**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Sales forecasting is very bene¯cial to most businesses. A successful business needs accurate sales forecasting to understand the market and sales trends. This paper presents a novel sales fore-casting model by integrating support vector regression (SVR) and bat algorithm (BA). Since the accuracy of SVR forecasting mainly depends on SVR parameters, we use BA for tuning these parameters because Bat is a newly introduced algorithm and has many parameters. In order to ¯nd the best set of BA parameters Taguchi method was utilized. We validated our model on four known UCI datasets. Then we applied our model in printed circuit board (PCB) sales forecasting case study. We compared the accuracy of the proposed model with Genetic algorithm (GA)–SVR, particle swarm optimization (PSO)–SVR, and classic-SVR. The exper-imental results show that the proposed model outperforms the others. To ensure the robustness of our proposed model, sensitivity analysis was also done using our model to ¯nd out the e®ects of dependent variables values on sales time series.International Journal of Information Technology and Decision Making 01/2013; 13(1). · 1.31 Impact Factor - SourceAvailable from: Lean Yu[Show abstract] [Hide abstract]

**ABSTRACT:**In this paper, an integrated data characteristic testing scheme is proposed for complex time series data exploration so as to select the most appropriate research methodology for complex time series modeling. Based on relationships across different data characteristics, data characteristics of time series data are divided into two main categories: nature characteristics and pattern characteristics in this paper. Accordingly, two relevant tasks, nature determination and pattern measurement, are involved in the proposed testing scheme. In nature determination, dynamics system generating the time series data is analyzed via nonstationarity, nonlinearity and complexity tests. In pattern measurement, the characteristics of cyclicity (and seasonality), mutability (or saltation) and randomicity (or noise pattern) are measured in terms of pattern importance. For illustration purpose, four main Chinese economic time series data are used as testing targets, and the data characteristics hidden in these time series data are thoroughly explored by using the proposed integrated testing scheme. Empirical results reveal that the natures of all sample data demonstrate complexity in the phase of nature determination, and in the meantime the main pattern of each time series is captured based on the pattern importance, indicating that the proposed scheme can be used as an effective data characteristic testing tool for complex time series data exploration from a comprehensive perspective.International Journal of Information Technology and Decision Making 07/2013; 12(03). · 1.31 Impact Factor - SourceAvailable from: Hussain Ali Bekhet[Show abstract] [Hide abstract]

**ABSTRACT:**Despite the increase in the number of non-performing loans and competition in the banking market, most of the Jordanian commercial banks are reluctant to use data mining tools to support credit decisions. Artificial neural networks represent a new family of statistical techniques and promising data mining tools that have been used successfully in classification problems in many domains. This paper proposes two credit scoring models using data mining techniques to support loan decisions for the Jordanian commercial banks. Loan application evaluation would improve credit decision effectiveness and control loan office tasks, as well as save analysis time and cost. Both accepted and rejected loan applications, from different Jordanian commercial banks, were used to build the credit scoring models. The results indicate that the logistic regression model performed slightly better than the radial basis function model in terms of the overall accuracy rate. However, the radial basis function was superior in identifying those customers who may default.Review of Development Finance. 01/2014;

Page 1

International Journal of Information Technology & Decision Making

Vol. 3, No. 1 (2004) 145–165

c ? World Scientific Publishing Company

FORECASTING FOREIGN EXCHANGE RATES

WITH ARTIFICIAL NEURAL NETWORKS: A REVIEW

WEI HUANG

Institute of Systems Science, Academy of Mathematics and Systems Sciences

Chinese Academy of Sciences, Beijing 100080, People’s Republic of China

School of Knowledge Science, Japan Advanced Institute of Science and Technology

1-1, Asahidai, Ishikawa 923-1292, Japan

K. K. LAI

Department of Management Sciences, City University of Hong Kong

Tat Chee Avenue, Kowloon, Hong Kong

Y. NAKAMORI

School of Knowledge Science, Japan Advanced Institute of Science and Technology

1-1, Asahidai, Ishikawa 923-1292, Japan

SHOUYANG WANG∗

Institute of Systems Science, Academy of Mathematics and Systems Sciences

Chinese Academy of Sciences, Beijing 100080, People’s Republic of China

Tel: 86-10-62651381

Fax: 86-10-62568364

sywang@amss.ac.cn

Forecasting exchange rates is an important financial problem that is receiving increas-

ing attention especially because of its difficulty and practical applications. Artificial

neural networks (ANNs) have been widely used as a promising alternative approach for

a forecasting task because of several distinguished features. Research efforts on ANNs

for forecasting exchange rates are considerable. In this paper, we attempt to provide a

survey of research in this area. Several design factors significantly impact the accuracy of

neural network forecasts. These factors include the selection of input variables, prepar-

ing data, and network architecture. There is no consensus about the factors. In different

cases, various decisions have their own effectiveness. We also describe the integration of

ANNs with other methods and report the comparison between performances of ANNs

and those of other forecasting methods, and finding mixed results. Finally, the future

research directions in this area are discussed.

Keywords: Artificial neural networks; exchange rate; forecasting.

∗Corresponding author. This author is also with School of Business, Hunan University

145

Page 2

146W. Huang et al.

1. Introduction

The foreign exchange market is the largest and most liquid of the financial markets,

with an estimated $1 trillion traded every day. Exchange rates are amongst the

most important economic indices in the international monetary markets. For large

multinational firms, which conduct substantial currency transfers in the course of

business, being able to accurately forecast exchange rate movements can result in

substantial improvement in the firm’s overall profitability.

Exchange rates are affected by many highly correlated economic, political

and even psychological factors. These factors interact in a very complex fashion.

Exchange rate series exhibit high volatility, complexity and noise that result from

an elusive market mechanism generating daily observations.49Evidence has clearly

shown that while there is little linear dependence, the null hypothesis of indepen-

dence can be strongly rejected, demonstrating the existence of non-linearities in

exchange rates.11

Much research effort has been devoted to exploring the nonlinearity of exchange

rate data and to developing specific nonlinear models to improve exchange rate fore-

casting. Parametric nonlinear models such as the autoregressive random variance

(ARV) model,44autoregressive conditional heteroscedasticity (ARCH),16general

autoregressive conditional heteroskedasticity (GARCH),1chaotic dynamic31and

self-exciting threshold autoregressive4models have been proposed and applied to

foreign exchange rate forecasting. While these models may be good for a particu-

lar situation, they perform poorly for other applications. The pre-specification of

the model form restricts the usefulness of these parametric nonlinear models since

many other possible nonlinear patterns can be considered. One particular nonlin-

ear specification will not be general enough to capture all the nonlinearities in the

data. Some nonparametric methods have also been proposed to forecast exchange

rates.7,28,29However, nonparametric methods investigated in these studies are still

unable to improve upon a simple random walk model in out-of-sample predictions

of exchange rates.

There has been growing interest in the adoption of the state-of-the-art artifi-

cial intelligence technologies to solve the problem. One stream of these advanced

techniques focuses on the use of artificial neural networks (ANNs) to analyze

the historical data and provide predictions on future movements in the foreign

exchange market. An ANN is a system loosely modeled on the human brain,

which detect the underlying functional relationships within a set of data and

perform tasks such as pattern recognition, classification, evaluation, modeling, pre-

diction and control. ANNs are particularly well suited to finding accurate solu-

tions in an environment characterized by complex, noisy, irrelevant or partial

information. Several distinguishing features of ANNs make them valuable and

attractive in forecasting. First, as opposed to the traditional model-based meth-

ods, ANNs are data-driven self-adaptive methods in that there are few a priori

assumptions about the models for problems under study. Second, ANNs can

Page 3

Forecasting Foreign Exchange Rates with ANN 147

generalize. Third, ANNs are universal functional approximators. Finally, ANNs

are nonlinear.59

The idea of using ANNs for forecasting exchange rates is not new. Weigend

et al.53find that neural networks are better than random walk models in pre-

dicting the DEM/USD exchange rate. Refense et al.37apply a multi-layer per-

ceptron network to predict the exchange rate between USD/DEM, and to study

the convergence issue related to network architecture. Refense36develops a con-

structive learning algorithm to find the best neural network configuration in fore-

casting DEM/USD. Podding33studies the problem of predicting the trend of the

USD/DEM, and compares results to those obtained through regression analysis.

Pi32proposes a test for dependence among exchange rates. Shin41applies an

ANN model and moving average trading rules to investigate return predictabil-

ity of exchange rates. Zhang and Hutchinson62report the experience of forecasting

the tick-by-tick CHF/USD. Kuan and Liu24use both feed-forward and recurrent

neural networks to forecast GBP, CAD, DEM, JPY, CHF against USD. Wu55com-

pares neural networks with ARIMA models in forecasting Taiwan/USD exchange

rates. Hann and Steurer15mark comparisons between the neural network and linear

model in USD/DEM forecasting. Episcopos and Davis10investigate the problem of

predicting daily returns based on five Canadian exchange rates using ANNs and

a heteroskedastic model, EGARCH. Tenti48proposes the use of recurrent neural

networks in order to forecast exchange rates. Other earlier examples using ANN in

exchange rates application include Zhang61and Yao et al.56

Considerable research effort has gone into ANNs for forecasting exchange rates.

In this paper, we attempt to provide a survey of research in this area. Forecasting

exchange rates using ANNs is a process that can be divided into several steps.

Our goal in this paper is to find out consensus and disagreements in each step.

Hence, the comparisons of various methods used by different researchers go along

the whole forecasting process. For the consensus areas, guidelines are summarized.

With the disagreements, we analyze the reasons and point out the advantages and

disadvantages of various methods.

The paper is organized as follows. Section 2 covers input selection. Sec. 3 deals

with preparing data. In Sec. 4, we give a brief presentation of ANN architecture.

Section 5 describes the integration of ANNs with other methods. The comparison

between performances of ANNs and those of other forecasting methods is reported

in Sec. 6. Finally, conclusions and directions for future research are discussed in

Sec. 7.

2. Input Selection

There are two kinds of inputs — fundamental inputs and technical inputs.

Fundamental inputs include consumer price index, foreign reserve, GDP, export and

import volume, interest rates, etc. Technical inputs include the delayed time series

data, moving average, relative strength index, etc. Besides the above two kinds

Page 4

148W. Huang et al.

of inputs, individual forecast results could be used as inputs when using ANNs

as combined forecasting tools. In order to provide improved volatility forecasts,

Hu and Tsoukalas17combine GARCH, EGARCH, IGARCH and MAV volatility

forecast through ANNs. A preliminary effort to maximize the output performance

is conducted by ensuring adequate domain knowledge representation from input

variable.47,51

While Walczak et al.52claim that multivariate inputs are necessary, most neural

network inputs for exchange rate prediction are univariate. Univariate inputs utilize

data directly from time series being forecast, while multivariate inputs utilize infor-

mation from outside the time series in addition to the time series itself. Univariate

inputs rely on the predictive capabilities of the time series itself, corresponding to

a technical analysis as opposed to a fundamental analysis. For a univariate time

series forecasting problem, the network inputs are the past, lagged observations of

the data series and the output is the future value. Each input pattern is composed of

a moving window of a fixed length along the series. In this sense, the feed-forward

network used for time series forecasting is a general autoregressive model. The

question is how many lag periods should be included in predicting the future. Some

authors designed experiments to help selecting the number of input nodes while oth-

ers adopted some intuitive of empirical ideas. Mixed results are often reported in

the literatures. The lack of systematic approaches to neural network model building

is probably the primary cause of inconsistencies in the reported findings.

Ideally, we desire a small number of lag periods that can unveil the unique fea-

tures embedded in the data. The inclusion of excessive periods will adversely affect

the training time of the network, and the algorithm will likely be trapped in local

optimal solutions. On the other hand, if the lag is smaller than required, forecast-

ing accuracy will be jeopardized because the search is restricted to a subspace. Too

few or too many lag periods affect either the learning or prediction capability of

the network. It is desirable to reduce the number of input nodes to an absolute

minimum of essential nodes.

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)

as well as several extensions have been used as information-based in-sample model

selection criteria in selecting neural networks for foreign exchange rate time series

forecasting.34However, the in-sample model selection criteria are not able to provide

a reliable guide to out-of-sample performance and there is no apparent connection

between in-sample model fit and out-of-sample forecasting performance.

Huang et al.18propose a general approach called Autocorrelation Criterion (AC)

to determine lag structures in the applications of ANNs to univariate time series

forecasting. They apply the approach to the determination of input variables

for foreign exchange rate forecasting and conduct comparisons between AC and

information-based in sample model selection criterion. Experiment results show that

AC outperforms information-based in sample model selection criterion in terms of

forecasting performance.

Page 5

Forecasting Foreign Exchange Rates with ANN149

We suggest practitioners to employ Autocorrelation Criterion in the case of

univariate input. It does not require any assumptions, completely independent of

particular class of model. The selection of input variables is data-driven, making full

uses of information among sample observations even if the underlying relationships

are unknown or hard to describe. Thus, it is well suited for time series problems

whose solutions require knowledge that is difficult to specify but for which there

are enough data or observations. It nevertheless provides a practical way to solve

input selection for neural networks in time series forecasting.

Multivariate inputs are based on economics and finance theory. El Shazly et al.8

use fundamental inputs including the one month Eurorate on US dollar deposit, the

one month Eurorate on the foreign currency deposit, the spot exchange rate, and

the one month forward premium on the foreign currency. El Shazly et al.9use inputs

including the 90-day Euro deposit rate on the US dollar (INT US); the 90-day Euro

deposit rate on the British pound (INTBP), German mark (INTDM), Japanese

yen (INTJY), and the Swiss franc (INTSF); the spot exchange rate of the foreign

currency: SBP, SDM, SJY, and SSF expressed in direct form; the 90-day forward

exchange rate on the foreign currency: FDBP, FDDM, FDJY, FDSF; and the 90-day

future exchange rate on the foreign currency: FTBP, FTDM, FTJY, FTSF. The

above input variables selection comes from Interest Rate Parity, a principle by which

forward exchange rates reflect relative interest rates on default risk-free instruments

denominated in alternative currencies. Currencies of countries with high interest

rates are expected by the market to depreciate over time, and currencies of countries

with low interest rates are expected to appreciate over time.

Leung et al.25use the MUIP relationship39as the theoretical basis for multivari-

ate specification. The MUIP relationship can be modeled and written as follows:

et= α0+ α1(r∗

+α5(ca∗

t− rt) + α2(π∗

t/ny∗

t− π) + α3(p∗

t− pt) + α4(cat/nyt)

t) + µt

where e is the natural logarithm of the exchange rate, defined as the foreign cur-

rency price of domestic currency. r,π,p and (ca/ny) represent the logarithm of the

nominal short-term interest rate, expected price inflation rate, the logarithm of

the price level, and the ratio of current account to nominal GDP for the domestic

economy respectively. Asterisks denote the corresponding foreign variables. µ is the

error term. The variables (ca/ny) and (ca∗/ny∗) are proxies for the risk premium.

Tenti48uses inputs including the compound returns of the last n periods (where

n = 1,2,3,5,8), the running standard deviation of the k last periods (where k =

13,21,34), and technical indicators such as the average directional movement index

(ADX), trend movement index TMI), rate of change (ROC), and Ehlers leading

indicator (ELI). Lisi and Schiavo26use both the past observation of the series

itself and those of an “auxiliary” variable chosen among the remaining series. For

example, the lagged FRF/USD, GBP/USD are used to predict future FRF/USD.

According to Walczak and Cerpa’s51suggestions, multivariate inputs can be

determined through the following steps. Firstly, perform standard knowledge

Page 6

150W. Huang et al.

acquisition. Get as more explanatory variables to foreign exchange rates as pos-

sible from economics and finance theory. The primary purpose of the knowledge

acquisition phase is to guarantee that the input variable set is not under-specified,

providing all relevant domain criteria to the ANNs. Once a base set of input vari-

ables defined through knowledge acquisition, the set can be pruned to eliminate

variables that contribute noise to the ANNs and consequently reduce the ANNs

generalization performance. Smith43claims that ANNs input variables need to be

predictive, but should not be correlated. Correlated variables degrade ANNs perfor-

mance by interacting with each other as well as other elements to produce a biased

effect. A first pass filter to help identify noise variables is to calculate the correla-

tion of pairs of variables (Pearson correlation matrix). Alternatively, a chi-square

test may be used for categorical variables. If two variables have a high correlation,

then one of these two variables may be removed from the set of variables with-

out adversely affecting the ANNs performance. Additional statistical techniques

may be applied, depending on the distribution properties of the data set. Stepwise

regression (multiple or logistic) and factor analysis provide viable tools for evalu-

ating the predictive value of input variables and may serve as a secondary filter to

the Pearson correlation matrix.

Multivariate input has advantages in long term forecasting, unveiling the move-

ment trend of foreign exchange rates. But it needs more data and time. Some

explanatory variables are not available in time requirement. Univariate input has

not such problems. The practitioners can benefit from a net reduction in the devel-

opment costs, since less data is required. However, it lacks of economic explanations,

which weakens forecasting credibility.

3. Preparing Data

Due to the fact that only relatively little preliminary knowledge is required to

train artificial neural networks and on account of the black box character, data is

often presented to the networks without any further processing steps being taken.

However, the degree of care invested in preparing the data is of decisive importance

to the networks learning speed and the quality of approximation it can attain. Every

hour invested in preparing the data may save days in training the networks.

The first questions to be considered here are of a very general nature:

(1) Is sufficient data available, and does this data contain the correct information?

(2) Does the available data cover the range of the variables concerned as completely

as possible?

(3) Are there borderline cases that are not covered by the data?

(4) Does the data contain irrelevant information?

(5) Are there transformations or combinations of variables (e.g. ratios) that

describe the problem more effectively than the individual variables themselves?

Once all these points have been clarified, the data needs to be transformed into

an appropriate form for the networks. Various normalization methods are generally

Page 7

Forecasting Foreign Exchange Rates with ANN151

employed to this end. Tenti48normalizes inputs to zero mean and two standard

deviations. In Hu et al.’s17study, all inputs to ANNs are linearly normalized to

[0,1]. El Shazly et al.9suggest that data should be manipulated and converted to

the required format for further processing. In Lisi and Schiavo’s26study, the log-

differenced data are scaled linearly in the range of 0.2–0.8 in order to adapt them to

the output range of the sigmoid activation function. Qi and Zhang34apply natural

logarithm transformation to raw data to stabilize the series. An ADF test shows

that the transformed time series contains a unit root, thus the first order difference

is applied.

There is no consensus on whether data normalization should be used. For exam-

ple, it is still unclear that whether there is a need to normalize the input because

the arc weights can undo the scaling. Shanker et al.40investigate the effectiveness of

linear and statistical normalization methods for classification problems. They find

that data normalization methods do not necessarily lead to better performance

particularly when the networks and sample size are large. El Shazly et al.8apply

normalization and transformation to initial runs and then discard them. They find

that although the difference data set speeds the training time by reducing the noise

during training, the networks when tested yield poor forecasts. Based on the objec-

tive of improving the testing performance rather than speeding up training time,

they decide to use raw data during training. Zhang and Hu59find no significant

difference between using normalized and original data, based on their experience

with the exchange rate data. Hence, raw data are used in that study.

Although normalization of the data is not compulsory, it is sometimes unavoid-

able. If for example, a function is valid only for a limited range, e.g. the sigmoid

function (0.0–1.0) or the tanh function (−1.0–1.0), the network will be unable to

generate any output values outside of this range. The target output data for the

training and test phases must therefore be normalized.

In principle, it is not absolutely necessary to normalize the input data, as

the networks input layer is assigned a linear function. However, it is neverthe-

less inadvisable not to normalize data when using multivariate inputs. As a result

of normalization, all variables acquire the same significance for the learning pro-

cess. If normalization is not carried out, variables with greater values will be given

preference.

Generally, a data set is divided into two, the training set and the test set. The

training set is used for ANNs model development and the test set is used to evaluate

the forecasting ability. Sometimes a third set, called the validation set, is used to

avoid the overfitting problem or to determine the stopping point in the training

process.

There is no general solution to splitting the training set and test set. The Brain-

maker software randomly selects 10% of the facts from the data set and uses them

for testing. Yao et al.57suggest that historical data are divided into three portions:

training, validation and testing sets. The training set contains 70% of the collected

data, while the validation and the testing sets contain 20% and 10% respectively.

Page 8

152W. Huang et al.

The division is based on a rule of thumb derived from the authors’ experience.

Other researches just give the division directly, not touching on the reasons for it.

Sample size is another factor that can affect artificial neural networks forecasting

ability. Neural networks researchers have used various sizes of training sets, ranging

from one year to sixteen years.21,36,46,48,52,58Large samples are often claimed to

be optimal in training neural networks due to the large set of parameters involved

in the network. To test if there is a significant difference between large and small

training samples in modeling and forecasting exchange rates, Zhang and Hu58use

two training sample sizes. The large sample consists of 887 observations from 1976

to 1992, and the small one includes 261 data points from 1998 to 1992. Their result

is that the large sample outperforms the smaller sample. Most of the researchers

typically use all of the data in building neural networks forecasting model once they

have obtained their training data, with no attempt at comparing data quantity

effects on the quality of the produced forecasting models.

However, Kang,22in a comprehensive study of neural network time series fore-

casting, finds that neural networks forecasting models do not necessarily require

a large data set to perform well. Walczak50examines the effect of different sizes

of training sample sets on forecasting exchange rates. His research results indicate

that for financial time series, two years of training data is frequently all that is

required to produce optimal forecasting accuracy. He claims that given an appro-

priate amount of historical knowledge, neural networks can forecast future exchange

rates with 60% accuracy, while neural networks trained on a larger training set have

a worse forecasting performance. In addition to high-quality forecasts, the reduced

training set sizes reduce development cost and time.

Huang et al.19propose to determine the optimal quantity of training data by

using change-point detection. The behavior of exchange rates is evolving over time.

Therefore, we can conjecture that the movement of exchange rates has a series

of change points, which divide data into several homogeneous groups that take

heterogeneous characteristics from each other.

4. Architectures of ANNs

Three classes of ANNs architectures have been employed for forecasting foreign

exchange rates. In this section, we give a brief presentation and conduct some

comparisons.

4.1. Feedforward

In feedforward ANNs, the connections between units do not form cycles. Feedfor-

ward ANNs usually produce a response to an input quickly.

4.1.1. Multi-layer perceptrons (MLP)

MLP38is perhaps the most popular network architecture in use, which is rela-

tively easy to implement. An MLP is typically composed of several layers of nodes

Page 9

Forecasting Foreign Exchange Rates with ANN153

Fig. 1. Examples of multi-layer perceptron neural network architectures.

(see Fig. 1). The network thus has a simple interpretation as a form of input-

output model, with the weights and thresholds (biases) the free parameters of the

model. Although it has been shown theoretically that the MLP has a universal

functional approximating capability and can approximate any nonlinear function

with arbitrary accuracy, no universal guideline exists in choosing the appropriate

model structure for practical applications. Thus, a trial-and-errorapproach or cross-

validation experiment is often adopted to help find the best model. Typically a large

number of neural network architectures are considered. The one with the best per-

formance in the validation set is chosen as the winner, and the others are discarded.

4.1.2. Radial basis function networks (RBFNs)

RBFNs30have static Gaussian function as the nonlinearity for the hidden layer

processing elements. The Gaussian function responds only to a small region of the

input space where the Gaussian is centered. The key to successful implementation

of these networks is to find suitable centers for the Gaussian functions. This can

be done with supervised learning, but an unsupervised approach usually produces

better results.

The advantage of the radial basis function network is that it finds the input to

output map using local approximators. Usually the supervised segment is simply a

linear combination of the approximators. Since linear combiners have few weights,

these networks train extremely fast and require fewer training samples.

Page 10

154 W. Huang et al.

4.1.3. Learning vector quantization (LVQ)

LVQ12,23is a precursor of the well-known self-organizing maps (also called Kohonen

feature maps) and like them it can be seen as a special kind of artificial neural

network. A neural network for learning vector quantization consists of two layers:

an input layer and an output layer. It represents a set of reference vectors, the

coordinates of which are the weights of the connections leading from the input

neurons to an output neuron. Hence, one may also say that each output neuron

corresponds to one reference vector. This kind of ANNs architecture can only be

used for classification. Hence, we cannot employ it in forecasting foreign exchange

rates value.

4.1.4. General regression neural networks (GRNNs)

GRNNs45are memory-based feed-forward networks based on the estimation of

probability density functions. GRNNs featuring fast training times, can model non-

linear functions, and have been shown to perform well in noisy environments given

enough data. The GRNN topology consists of four layers: the input layer, pattern

layer, summation layer and output layer. Each layer of processing units is assigned

a specific computational function when nonlinear regression is performed. The only

adjustable parameter in a GRNN is the smoothing factor for the kernel function.

The optimization of the smoothing factor is critical to the GRNN’s performance and

is usually found through iterative adjustments and the cross-validation procedure.

The advantages of GRNN include

(1) Fast training times.

(2) Can handle both linear and non-linear data.

(3) Adding new samples to the training set does not require re-calibrating the

model.

(4) Only one adjustable parameter thereby making overtraining less likely.

The disadvantages include:

(1) Has trouble with irrelevant inputs (i.e. suffers from the dimensionality curse).

(2) No intuitive method for selecting the optimal smoothing parameter.

(3) Requires many training samples to adequately span the variation in the data.

(4) Requires that all the training samples be stored for future use (i.e. prediction).

4.2. Feedback

In feedback ANNs, there are cycles in the connections. Each time an input is pre-

sented, ANNs must iterate for a potentially long time before it produces a response.

Feedback ANNs are usually more difficult to train than feedforward ANNs.

Page 11

Forecasting Foreign Exchange Rates with ANN 155

4.2.1. Recurrent neural networks (RNNs)

Recurrent neural networks (RNNs), in which the input layer’s activity patterns

pass through the network more than once before generating a new output pattern,

can learn extremely complex temporal patterns. Recurrent architecture has been

proved to be superior to the windowing technique of overlapping snapshots of data,

which is used with standard back-propagation. In fact, by introducing time-lagged

model components, RNNs may respond to the same input pattern in a different

way at different times, depending on the input sequence. The main disadvantage

of RNNs is that they require substantially more connections, and more memory in

simulation, than standard back-propagation networks. RNNs can yield good result

because of the rough repetition of similar patterns present in exchange rate time

series. These regular but subtle sequences can provide a beneficial forecast ability.

4.3. Competitive

4.3.1. Fuzzy ARTMAP network

A fuzzy ARTMAP network3is a fuzzy ART2network that adds a single output

layer to generate an error signal to the fuzzy ART network that is made up of the

input, complement and category layers. The addition of the output layer for the

error signal transforms the network from an unsupervised network to a supervised

network where the network learns from examples in which the real category is

known.

4.3.2. Modular

Modular ANNs20essentially make use of multiple individual back-propagation net-

works (BPNs) that compete to learn different aspects of the problem. The networks

use an expert gating mechanism to choose which of the BPNs (called a local expert)

does best on a particular input observation, essentially assigning different regions

of the data space to different local experts. The general idea is that the error at

each local expert is weighted by its posterior probability (obtained as training takes

place) that it was responsible for in the current output vector. The gating networks

learns by trying to match its prior probabilities to the posterior probabilities found

in each local expert.

MLP is used most frequently for exchange rate prediction, because it has an

inherent capability of arbitrary input-output mapping. However, other types of

ANNs are also used.

Tenti48perform tests with three variations of RNNs. The first architecture used

(RNN1) has one hidden and one recurrent layer. The output layer is fed back into

the hidden layer, by means of the recurrent layer, showing the resulting output of

the previous pattern. In the second version (RNN2), similar to that of Fransconi

et al.,13the hidden layer is fed back into itself through an extra layer of recurrent

nodes. In the third version (RNN3), patterns are processed from the input layer

Page 12

156W. Huang et al.

through a recurrent layer of nodes, which holds the input layer’s contents as they

existed when previous patterns were trained, and then are fed back into input layer.

Leung et al.25examine GRNNs forecast ability and compare its performance

with a variety of forecasting techniques, including the multi-layered feed-forward

network.

Davis et al.6present a variety of neural networks forecasting models applied to

Canadian–US exchange rate data. Networks such as back-propagation, modular,

radial basis functions, linear vector quantization, fuzzy ARTMAP and genetic rein-

forcement learning are examined. It is important to note that they predict direction

shifts on Canadian–US exchange rate data rather than absolute price. Different

types of classification networks have characteristics that may prove effective for

specific classification data.

The selection of ANNs architecture is an open problem. ANNs designers must

use the constraints of the training data set and development cost for determination.

We suggest practitioners to employ MLP which is relatively easy and costs less to

implement.

5. The Integration of ANNs with Other Methods

The desire to further enhance the performance of neural network prediction has

led to the development of hybrid systems that combine neural networks with other

methods. The integration of ANNs with other technologies, such as wavelet analysis,

genetic algorithm, or fuzzy logic can improve the applications of ANNs. Although

each technology has its own strengths and weaknesses, these technologies are com-

plementary. Weaknesses of one technology can be overcome by strengths of another

by achieving a systematic effect. Such an effect can create results that are more

efficient, productive, and effective than the sum of their parts.

Genetic algorithm (GA) is a class of probabilistic search techniques based on

biological evolution. Each point in the solution space is coded as a binary string

called a chromosome. For instance, the co-ordinate (10,5,3) is encoded as

1 0 1 0

?

??

10

?

0 1 0 1

?

??

5

?

0 0 1 1

?

??

3

?

When a new generation exists, each member is ranked according to its fitness.

From this, a new population must be created. Essentially this is a “survival of the

fittest solution”, and the members used for mating are chosen with a probability

proportional to their fitness.

A technique called crossover is employed to maximize retention of the good

points of the previous generation. This is analogous to biological mating in which a

child may be superior to both parents if it inherits good genes from both parents. In

the computing process, this is achieved by swapping corresponding bits in pairs of

chromosomes according to a given crossover rate; for instance, the last three bits of

one chromosome may be swapped with the last three bits of another chromosome.

Page 13

Forecasting Foreign Exchange Rates with ANN157

If the population does not contain all of the traits needed to solve a problem, no

amount of crossover will work. As a result, a single bit is flipped very infrequently.

This is called mutation, and solves one of the problems of neural networks — that we

arrive at local minima. Mutation provides a way out by preventing a bit converging

on a single value throughout the entire population. Mutation must be kept to a

minimum to prevent loss of good chromosomes.

The inclusion of GA search techniques was undertaken for two reasons. The first

relates to the potential GA offer in terms of adaptiveness. The flexibility, robustness

and simplicity that GA offers render them very attractive in that respect. The

second reason stems from the difficulty in optimizing neural network applications.

By operating on entire populations of candid solutions in parallel, GA is much less

likely to get stuck at a local optimum.

Wavelet analysis is used to process information effectively at different scales.

It is very useful for feature detection from complex and chaotic time series. In

particular, the specific local properties of wavelets can be useful in describing the

signals with discontinuous or fractal structures in the financial market. It also allows

the removal of noise-dependent high frequencies, while conserving the signal bearing

high frequency terms. However, one of the most critical issues in the application of

the wavelet analysis is to choose the correct wavelet thresholding parameters.

El Shazly et al.9design a hybrid system combining neural networks with genetic

training to forecast the three-month spot exchange rate. Once the network is

trained, tested and identified as being “good”, a GA is applied to it in order to

optimize its performance. The process of genetic evolution works on the neuron con-

nection of a trained network by applying two procedures: mutation and crossover.

The application of hybrid systems seems to be well suited for the forecasting of

financial data.

Shin et al.42propose an integrated thresholding design of the optimal or near-

optimal wavelet transformation (WT) by GA to represent a significant signal most

suitable in ANN models. The model is applied to forecast the Korean won/USD

returns one day ahead of time. In this study, the multi-scale signal representation of

ANNs is supported by a wavelet transform as the multi-signal decomposition tech-

nique to detect the features of significant patterns. A strategy is devised using WT

to construct a filter that is significantly matched to the frequency of the time series

within the combined model. The experimental results show the enhanced filtering

or signal multi-resolution power of wavelet analysis by GA in the performance of

the ANNs. This study also finds that the hybrid system of wavelet transformations

and ANNs by GA is much better than other ANNs that use other three-wavelet

thresholding algorithms (cross-validation, best level, and best basis) to increase

forecasting performance.

Page 14

158W. Huang et al.

6. Performance Comparison with Other Forecasting Methods

There are inconsistent reports on the performance of ANNs for forecasting exchange

rates when compared with other forecasting methods. Table 1 summarizes the lit-

erature on the relative performance of ANNs.

Weigend et al.53find that neural networks are better than random walk

models (RW) in predicting the DEM/USD exchange rate. Wei et al.54claim

that ANNs’ forecasting performance is better than those of AR(p), ARMA(p,q),

ARIMA(p,d,q). Lisi et al.26make a comparison between ANNs and chaotic models

in forecasting exchange rate prediction. ANNs perform slightly better than chaotic

models, in term of NMSE; nevertheless, the two models are statistically equiva-

lent. Yao and Tan57show that irrespective of NMSE, gradient or profit, ANNs are

much better than traditional ARIMA model when forecasting the exchange rates

between USD and five other major currencies, AUD, CHF, DEM, GBP and JPY.

Leung et al.25point out that GRNNs generally outperform parametric multivariate

transfer functions and the random walk models.

Episcopos and Davis10suggest that neural networks are similar to EGARCH,

but superior to random walk models in terms of in-sample fit and out-of-sample

prediction performance. Hann and Steurer15compared neural network models with

linear monetary models in forecasting USD/DEM. Out-of-sample results show that,

for weekly data, neural networks are much better than linear models and na¨ ıve

predictions of a random walk model with regard to Theil’s U measure, the hit rate,

the annualized returns and the Sharp ratio. However, if monthly data are used,

neural networks do not show much improvement over linear models. Monthly data

usually contain more irregularities (seasonality, cyclicity, nonlinearity, noise).

Zhang and Hutchinson62find mixed results for neural networks in compari-

son with those from random walk models using different sections of the data set.

Kuan and Liu24examine the out-of-sample forecasting ability of neural networks on

five exchange rates against the USD, including GBP, CAD, DEM, JPY and CHF.

For the GBP and JPY, they demonstrate that neural networks have significant

market timing ability and/or achieve significantly lower out-of-sample RMSE than

the random walk model across three testing periods. For the other three exchange

rates, neural networks are not shown to be superior in forecasting performance.

Their results also show that different network models perform quite differently

in out-of-sample forecasting. Hu et al.17compare combining the performance of

ANNs with those of various forecasting methods. Using different performance mea-

surements and different data stages, they get different results. ANNs are not always

better than other forecasting tools. Zhang and Hu58find that neural networks pre-

dict much better than random walk model when using large training samples. Small

training samples will make ANNs fail to outperform the random walks for longer

forecast horizons. They suggest possible structural changes in exchange rate data.

Therefore, as more observations are available, they should be used to revise the fore-

casting neural networks models to better reflect change in the underlying pattern.

Page 15

Forecasting Foreign Exchange Rates with ANN 159

Table 1. The relative performance of ANNs with traditional forecasting methods.

Researchers

Data

ANNs Type Traditional Forecasting Method

Performance Measure

Conclusions

Episcopos and

Davis10

USD, DEM, FRF, JPY,

GBP against CAD

MLP

EGARCH, RW

RMSE

Similar to EGARCH;

Better than RW

Hann and

Steurer15

DEM/USD

MLP

Linear model, RW

Theil’s U measure, Hit

rate, the annualized returns and the Sharpratio

Better in weekly data;

Similar in monthly data

Hu and

Tsoukalas17

BEF/LUF, GBP, DKK,

NLG, FRF, GRD, IEP,

ITL, PTE, ESP, USD

against DEM

MLP

MAV, GARCH,

EGARCH, IGARCH,

OLS, AVE

RMSE, MAE

Mixed results

Kuan and Liu24

GBP, CAD, DEM, JPY

and CHF against USD

MLP, RNNs

RW

RMSE

Mixed results

Leung et al.25

GBP, JPY, CAD against

USD

GRNNs

Multivariate transfer

function, RW

MAE, RMSE

Better

Lisi and Schiavo26

FRF, DEM, ITL, GBP

against USD

MLP

Chaotic model, RW

NMSE

Better

Wei and Jiang54

GBP/USD

MLP

AR, ARMA, ARIMA

RMSE

Better

Weigend et al.53

DEM/USD

MLP

RW

ARV

Better

Yao and Tan57

AUD, CHF, DEM, GBP,

JPY against USD

MLP

ARIMA

NMSE, Correctness of

gradient prediction

Better

Zhang and Hu58

GBP/USD

MLP

RW

RMSE, MAE, MAPE

Mixed results

Zhang and

Hutchinson62

CHF/USD

MLP

RW

RMSE

Mixed results

MAE: mean absolute error.

RMSE: root mean square error.

NMSE: normalized mean square error.

MAPE: mean absolute percentage error.

ARV: average relative variance.

#### View other sources

#### Hide other sources

- Available from Kin Keung Lai · May 29, 2014
- Available from psu.edu