Content uploaded by Athanasios Triantafyllidis
Author content
All content in this area was uploaded by Athanasios Triantafyllidis on Jul 21, 2016
Content may be subject to copyright.
1
Putting ATM Cash Requirements into Context
ANN relation to socioeconomic events and Variety
George E. Vranopoulos
eProject
Email: VranopoulosGeroge@gmail.com
Athanassios Triantafyllidis
The American College of Greece
School of Computer Information Systems
Email: tracer@acg.gr
Marianthy Yiannopoulou
Edinburgh College
Email:
Marianthy.Yiannopoulou@edinburghcollege.ac.uk
Abstract – .Developing a cash demand forecasting model for
ATMs’ network is a challenging task since there are substantial
fluctuations over time and depends on the location of the ATM.
Variety is one of the principal V’s of Big Data, having context
being one of the most important aspects. In this paper the ATM
cash withdrawals datasets have been “placed into context” by
incorporating socioeconomic datasets. This contextual
enhancement improves the effectiveness of the ANN model
devised
Index Terms – ATM, ANN, Neural Networks, Big Data Variety,
Cash Demand Forecasting
1. INTRODUCTION
One of the largest banks in Greece in an attempt to minimize
the money held “unutilised” in ATMs
1
, investigated the
development of a model for cash demand forecasting with the
use of Artificial Neural Networks. The prediction of the cash
requirements is very difficult, if not impossible and the only
resort for insight can be identified in the review of historical
data [1].
The definition of Big Data which is primarily defined in terms
of V’s, based on the first definition by Laney [3]:
Volume: Refers to the amount of data being created and
stored [4] in the digital universe.
Velocity: In Big Data environments the speed in which
data change is quite high.
Variety: This characteristic has to do with the data itself
and the flavours it can pertain. Sensors, IoT, database
records, video and audio have different formats and
standards, let alone the fact that in many cases alternative
communication protocols must be used to disseminate the
data streams.
Variety, is the biggest challenge in Big Data environments [2]
as it is not simply a storage problem, it has to do with quality,
proliferation, data outliers, context, coherence, interactivity and
much more not addressed by the Data Stores infrastructure.
Data in order to have value must be in a context [5]. The
datasets were extracted from the Data Warehouse and
1
Automatic Teller Machine
Operational systems of the Bank and certain challenges were
identified in analysing the data.
The chapters to follow in this paper will present a historical
review of the socioeconomic situation in Greece, the
challenges posed by Variety and the methodology used in
devising and evaluating an ANN
2
. It will be demonstrated
though the following chapters that only by applying
socioeconomic context to the data could there be sufficient
validity to the ANN.
2. HISTORICAL REVIEW
In order to understand the socioeconomic environment of
Greece in the last years a brief historical review will be
provided:
23 April 2010, PM
3
George Papandreou requested the
help of the IMF
4
and the EC
5
in order to avoid the Public
Sector default. Two main factors directed this decision, a)
the overwhelming public deficit of 146% debt-to-GDP
6
ratio at the end of 2009 and b) the high interest rates that
the Greek government had to endure in order to get funds
from Global Markets - spreads reached 486 basis points
[6]. This decision and the respective aid provided were
accompanied by measures in respect to taxation and
government spending.
23 March 2013, Cyprus receives €10 billion bailout by
EC, ECB
7
and IMF. In return it reformed its banking
system. The Cyprus Banking Crises affected Greek Banks
since the Greek subsidiaries of the Cyprus Banks were
acquired by Greek Banks.
25 June 2015, following the inability to reach an
agreement between EC, IMF, ECB and Greek government
for the extension of the funding program PM Alexis
Tsipras declared a referendum, a 3 day Bank “holiday”,
closed the Stock Market until the 6th of July and imposed
Capital Controls.
2
Artificial Neural Network
3
Prime Minister
4
International Monetary Fund
5
European Commission
6
Gross Domestic Product
7
European Central Bank
2
Austerity measures were taken in order to achieve the
monetary targets set by the EU commission and the IMF,
which led to the decrease in demand of goods and services
pushing the Greek economy into deep recession [6]. After the
Greek bailout, international financial sector anxiety raised
sovereign spreads. This sovereign weakness was transferred to
the Greek financial sector [7]. Greek Banks “downfall” starts
with the recognition of impairment losses on the Greek
Government Bonds subject to the agreed haircut. Profitability
and the steep decrease of domestic residents’ deposits
amplified the problem [8]. Greek banks found themselves
working in a suffocative economic environment where
liquidity was limited, credit expansion was restricted and
financial markets were unstable. These environmental
parameters led to increase in the non-performing loans and
sharp decrease in income and profitability [9]. In order for
banks to regain profitability there had to be mergers [10]. The
Greek banking sector was shrunk with the mergers of Greek
banks and the acquisition of all Cyprus banks operating in
Greece.
3. THE PROBLEM DEFINITION
There is a DATA WAREHOUSE process that utilizes
information from ATM withdrawals, ATM availability levels
and uses a time-series model to calculate the projected demand
for cash by ATMs. This process was quite efficient in the first
years of operation but as shown in Figure 1
8
there is an evident
deviation, during the last year of operation, in the projected and
actual demand for money.
Several parameters addressing seasonality and financial events
(e.g. pension payments, payroll, Etc.) were taken into
consideration when utilising the time-series model but the
produced outcome as shown in Figure 1 had substantial
deviations. These deviations suggested the need for the
adjustment of the method used in order to reflect the new ATM
cash demands correctly.
It was evident that something “was going wrong” in the last
year of calculations and it seemed only right to investigate
what “changed.” Society and economy had changed – capital
8
In all graphs, cash demand levels are hidden because of
nondisclosure agreement
controls, political unrest, bank mergers, and diminished
liquidity.
3.1. Datasets Variety
The Bank has in place an elaborate Data Warehouse that is
“fed” by the transactional systems. One of the available marts
pertains information concerning ATMs.
The datasets provided by the Bank were the following:
I) Withdrawals data
a) Code of the ATM.
b) Date.
c) Total withdrawal.
II) ATM data
a) Code of the ATM.
b) Name
c) Location
III) Branches data
a) Branch Code
b) Branch Name
c) Address
d) Latitude
e) Longitude
The “Withdrawals” dataset included data for the interval from
the 1st of January 2012 until the 11th September 2015. The
“ATM” and “Branches” datasets were snap-shots of the
information withheld in the Operational systems of the Bank.
Once the data were available and imported in the staging
environment for the specific project the following Variety
challenges were identified:
Data coming from the Data Warehouse and the
Operational systems showed inconsistencies in total
withdrawals.
Different date formats were used in the extraction process.
Values of attributes like “location” were encoded in
different detail. In some cases the address was used whilst
in others the branch number hosting the ATM was used.
Since datasets for “ATMs” and “Branches” where snap-
shots, some ATMs in the “Withdrawals” set could not be
identified since they were retired.
These challenges were very logical and justifiable in the
business environment nonetheless since there were no
metadata to identify them, the Data Scientist, that was required
to process the data, lost valuable time in normalising and
understanding the datasets.
3.2. ATM Cash Management
ATM cash management and especially the definition of a
forecasting model is a complex task. In understanding the
complexity, the process will be outlined and the related
complexity factors will be identified.
Figure 1. Data Warehouse Time Series Model
3
In order for an ATM to be replenished, a money order request
should be initiated. The origin of the request depends on the
ATMs location. If the ATM is located within a branch, the
branch manager is responsible for the order, while if the ATM
is off-site (standalone) the order is issued by the centralized
unit responsible for all off-site ATMs’. For better
comprehension of the process, the two ATM processes will be
presented separately.
On-site ATM. The branch manager is responsible to monitor
the cash level of the ATM and replenish it accordingly. Money
orders are issued in conjunction with the cash requirements
from the tellers. Usually the money drops are executed every
two days. In cases of cash drains, emergency money trains are
executed. The branch manager has to accurately predict the
required cash for both the cashiers and the ATMs. To be on
the safe side, he usually issues money orders in excess of the
actual needs. This money that resides in the branch safe cannot
be used by the Treasury department in financial investments
thus diminishing prospective gains from investments.
Off-Site ATM. In the case of the off-site ATM there are
additional factors to be taken into consideration in issuing an
order. The off-site ATMs are replenished by CIT
9
companies.
This fact makes the process more complicated since the money
has to be transferred from the central repository to the closest
branch from where the CIT will collect the money and
replenish the ATM. A CIT company should get the request for
replacement at least two days prior to execution. In this case
logistics are more complicated and for every scheduled
replenishment, the bank has to pay a fee. Fees are calculated
mainly based on the location of the ATM, thus there are high
costs in case of ATMs located in distant areas (e.g. islands).
Apart from the “close” monitoring for replenishing the ATM,
the projected demand is calculated based on the average
withdrawals of the respective previous years demand, adjusted
by the trend identified in the previous months’ of the same
year. This method of prediction exhibits a) MAPE
10
in the area
of 40% and RMSPE10 in the area of 70%. Even though the
error levels are quite high, the Bank enjoys less than 3%
unavailability, including ATMs technical down time (≈2.3%)
and cash outs (≈0.5%).
In optimising the cash management process the following
factors should be taken into consideration:
Time. When should the order be issued in order for the
money to be available for the replenishment process?
Level. The amount to be requested. What would be the
optimal amount to request? This parameter also has
certain limitations. An ATM can hold certain “money
cartridges” with specific capacity. Furthermore, CIT
9
CIT stands for Cash-in-Transit and denotes companies
specializing in physical transfer of banknotes and coins.
10
See 4.2.4. Model Evaluation.
companies impose, for security reasons, specific levels of
amounts that they can transfer.
Cost. Cost is comprised of the following internal and
external costs:
o Fees paid to CIT for money transports to
branches.
o Fees paid to CIT for ATM replenishments.
o Cost of Money, for the amounts the Treasury
department cannot invest.
Fame. The reputation of the bank can be severely
damaged if ATMs do not “give” money to the customers.
In Greece, especially in the period of the imposition of the
capital controls, it was of outmost importance to always
replenish the ATMs so that the already shaken confidence
of the customers towards the banking system wouldn’t be
shuttered.
In any attempt to predict the demand for cash, all factors
should be taken into consideration. In some cases upper
management sponsorship is obligatory since acceptable rates
and targets should be set. For instance; a) what is the
acceptable rate for “cash outs”? 10%, 20% or below 1%? b)
What is the average rate of return on money invested by the
Treasury department? 3%, 4%, 5% or 10%?
4. FORMULATION OF THE ANN
In order to visualize and analyse the data MatLab R2015,
which is a multi-paradigm computing environment enhanced
with a 4th generation programming language, was used. The
information was normalised and Variety challenges were
rectified in a staging environment prior to MatLab processing.
The parameters’ encoding and value ranges were also
identified and applied in the staging environment.
4.1. Methodology
In the process of identifying an ANN that would produce
relatively accurate results the following courses of action were
employed:
Variety. Interviews with developers and business users were
conducted in order to clarify and properly “define” all datasets.
Definition of ANN. Based on the available literature
commonly used parameters for the ANN were gathered.
Interview was conducted with the Head of BI
11
in
understanding which parameters were taken into consideration
on the existing time-series mechanism
12
. Correlation analysis
was performed in identifying best value ranges and indicative
values concerning seasonality. The preliminary steps of the
analysis were conducted with aggregate withdrawal levels
across the Bank whilst for the later steps randomly selected
ATMs were employed.
11
Business Intelligence
12
The parameters remain confidential because of nondisclosure
agreement
4
Evaluation of the ANN. In identifying the best number of
hidden neurons in the ANN and the training method,
consecutive trial executions were conducted. The efficiency of
the ANN forecasting was evaluated based on the Root Mean
Square Percentage Error and the Mean Absolute Percentage
Error. In aggregating results’ data, Microsoft Excel and
MatLab were used and native function of min, max, average
and median were utilized.
4.2. Implementation
Towards the final objective which is the identification of an
ANN that could predict accurately the demand of cash for each
ATM several steps were undertaken.
4.2.1. Variety Resolution
It was identified in the early steps of the process that there
were some challenges posed by Variety. The process followed
and the solutions implemented will be described.
Before analysis, the data had to be imported in the tool of
analysis and visualization. In the first attempt to visualize the
data with the use of MatLab it was identified that the cash
demand levels for one specific year was always missing no-
matter the requested interval, indicating that an error must have
occurred in the import process. The primary data files were
investigated with the use of a text editor. The visual
investigation of the data identified that one of the primary files
provided, had different date format from all the rest. The
developers responsible for the extraction process were
contacted to verify the finding. It was concluded by the
developers that this was the case and it occurred due to export
date/time formats utilised for the specific year’s data mart.
The mart’s data format was altered at some point in time due to
the fact that data were exported based on a request that
specifically required the American format (mm/dd/yyyy)
instead of the one normally utilised (dd/mm/yyyy). An ETL
13
process was employed in transforming the date to be coherent
with the rest.
The first step in the analysis was to reproduce the BI time
series graph based on the detailed data supplied. This proved
to be impossible, see Figure 2, since the data acquired from the
Data Warehouse and the Operational systems produced
deviations.
13
Extract, Transform, Load
The business users of the respective datasets were interviewed
in an attempt to describe what the value “withdrawals” meant
for each set. Not surprisingly, each department had a different
definition, and of course a different value in the respective
dataset. Some of the definitions are presented for the sake of
argument:
The cash transactions of cards OnUs. Meaning
transactions of customers with cards issued by the Bank.
The cash transactions of cards Not OnUs. Meaning
transactions of customers with cards issued by other
Banks.
Customer segments were employed in excluding
customers, for instance corporate payroll clients.
Once the respective definitions were matched with the dataset
a calculation was devised in the staging area in order to
aggregate the actual outflow of cash from the ATM. This
“new” definition of cash demand, unless properly documented
could pose complexity and ambiguity or even erroneous
conclusions if not “quoted” in the findings.
If context and an accurate definition of the calculated dataset is
not provided, then any attempt to identify ATM cash levels
could be misunderstood. Variety would be a challenge again if
context, intent of use [11] and confidence levels as well as
aggregations [12] are not adequately documented. To that end
it was agreed upon and recorded in the results presented and in
the respective source code – as comments – that although this
figure is referred to as “cash demand” it is not the demand but
merely the total recorded outflows. Demand and total outflows
would match in an ideal world but in the real world the
outflows are lower than the demand. Main reasons for that are
hardware failures of the ATM, an ATM running out of cash,
network communication failures and many other cases were
the customer requests money and the ATM is unable to deliver
(e.g. unable to deliver requested denomination, the customer
has exceeded the daily withdrawal limit, the account has a
lower balance than the requested amount Etc.). Furthermore it
is quite difficult to identify this deviation since the customer
might or might not use another ATM to serve his needs.
Eventually, from the perspective of the data scientist, all data
were in-place and coherent.
The next step was to identify the parameters to be utilized in
the definition of the ANN. The BI experts were interviewed in
order to understand the current process utilising the time-series
analysis. Additionally research in the literature was conducted
to identify common parameters in such systems. Finally,
several parameters were identified and documented
14
.
A simple ANN was devised with the use of MatLab and
aggregated outflows were processed and visualised. The result
was discouraging having deviation as high as 300%. At that
14
The new parameter set remain confidential because of
nondisclosure agreement
Figure 2. Data Warehouse vs Operational Datasets
5
time it was decided that further analysis could not continue in
the aggregate bank level because differentiations amongst
ATMs could not be introduced in the model and levels of cash
demands were distorted. As Statisticians say “if you draw a
large enough sample, the way the sample mean varies around
the population mean can be described by a normal distribution”
[13]. It would be useless to have an aggregated projection and
adapt it into individual ATMs’.
Thus the next step was to randomly select a set of ATMs to
further analyse on a one-by-one approach. Precaution was
taken to select ATMs that where “atomic” throughout the
referenced time interval. Atomic refers to ATMs that were not
affected by the relocation, installation or decommission of
another ATM in 10km radius. Certain parameters that
addressed seasonality were affected by the location of the
ATM. It might be the case that Monday and Friday had no
seasonality effect but there might be a seasonality effect
between January and August. It’s one thing to have an ATM
in the centre of Athens and another to have it in a cosmopolitan
Greek island. Geolocation could be the key to identify and
apply different seasonality schemes. The field “location” was
utilised in identifying the region and subsequently the
respective seasonality pattern for that region. In the field at
first glance, the Branch code was identified and once cross
referenced with the branches dataset the lat/long was
identified. A reference set was created in the stage area where
seasonality schemas were defined per geolocation polygon
15
.
A revision of the selected ATMs showed that a couple of the
selected ATMs where missing from the result dataset whilst
from the existing ones, a couple did not have any seasonality
information. It seemed like a bug in the process of utilising
coordinates in identifying whether a point was enclosed in a
polygon. In the debugging process it was shown that the
respective location was empty, so it looked like the branch
code was not found in the respective branches set. Once the
branch code was printed out, though a debug process, it
revealed two problem areas:
a. The branch code was appearing correctly, but there
were no corresponding information for that code in
the “Branches” dataset. The curious thing was that
the Branch code was reported in the initial
questioning as erroneous. Further investigation with
the business users showed that these ATMs were
decommissioned.
b. It was an Address, not a reference code. Developers
were questioned of this bizarre phenomenon and
concluded that it had to do with the data entry from
the respective business unit. Once the responsible
business users were interviewed they explained that
the phenomenon depended on the type of the ATM. In
case that the ATM is on premises, the branch code
15
Geospatial queries can utilize arrays of geolocations (lat,
long) in order to form a polygon and eventually determine
whether a point (geolocation) is within it.
was used. If it was an Off-Site ATM where there is no
branch to refer to, they used the actual Address.
The solution to this Variety challenge was quite simple.
Historical data were acquired in order to have a complete set of
ATMs and branches. A filter was applied in the staging area
and in case of none-numeric content in the branch code, an
automated query to a georeference engine was performed.
The actual processing of the 1000 ATMs through the ANN
revealed two more variety manifestations in relation to
transactional data: a) In certain ATMs one or two dates were
missing and the record count per (group by) ATM was not
equal in all ATMs. The respective investigation showed that
dates where missing from the original datasets and there was
no pattern, the missing dates – irrespective of the ATM - were
randomly distributed all over the year. After discussing the
effect with the ATMs support team and investigating the
missing records (identified by ATM and date) it was concluded
that due to technical problems (Network failure, ATM
malfunction, Etc.) there was no information record about the
cash level. In order to overcome this issue, since zero or
missing records would distort the trend, the value of cash
demand level for that date was replaced with the average of the
respective month’s levels. b) Certain ATMs in the testing
process revealed zero demand levels for the whole month, thus
making the average zero as well. In this case the safe guard
used as identified in (a) was bypassed and once again there was
a problem in the execution of the ANN specifically in the
evaluation of the performance though the testing process. The
investigation showed that this effect was attributed to ATMs
that were not operational during the respective month and that
had nothing to do with technical failures. It was pointed out by
the business unit that was responsible for the deployment of the
ATMs, that certain ATMs were operational only during a
couple of months within a year. These ATMs exhibited
intense seasonality which led to the business decision, in order
to minimize costs, to deactivate the ATM for most months of
the year. In order to resolve the issue, a filter was applied in
the random selector of the ATMs that excluded these “special
case” ATMs.
4.2.2. The Socioeconomic Effect
Once again simple executions of the ANN with the use of
MatLab identified severe deviations between calculated and
actual values. In an attempt to calibrate the parameters, the
correlation coefficient (r), see Equation 1[30], was calculated
for different sets of values for each variable and the best
combination was utilized. In the process of identifying the best
set “brute force attack” methodology was utilized. Since there
is no mathematical formula to calculate the set giving the best
r, all possible set values were sequentially iterated. In each
iteration, the calculated r was checked against the previous
maximum thus identifying the optimal set.
Equation 1. Correlation Coefficient, r.
6
A program was developed which utilised nested loops,
recursion was also used but abandoned due to “out of memory”
errors, in iterating all possible combinations of values and
calculating for each set the correlation coefficient (r). The set
with the best coefficient was used in the respective seasonality
parameter of the ATM. This piece of code was a bit intelligent
as it displayed the evolution of the “attack” in a graph (Figure
3) and the last set and the best set at any time were logged into
permanent storage.
This was done because the iteration lasted several weeks and in
some cases data were lost since the process had to start from
the beginning due to technical (e.g. memory depletion, display
drivers errors) or power problems.
Actual outflows and the ones calculated from the ANN where
plot in a chart and although there seemed to be general
coherence, “steps” and “spikes” were observed (see Figure 4).
Further investigation in the ANN was performed by employing
different learning algorithms and variation in the number of
hidden neurons but there was no substantial improvement.
The dates presenting the “steps” and “spikes” were listed and
investigated. It was identified that “steps” occurred in the
dates that systemic merges with other banks were executed and
“spikes” could be related to specific political events. Examples
of positive, higher cash demand, “spikes” would be a) the end
of the Loan agreement between Greece and EU on 28/02/2015
b) the failure to reach an agreement in ECOFIN on 18/06/2015.
Examples of negative, lower cash demand, “spikes” would be
a) The personal involvement of P.M. Tsipras in the negotiation
on 24/06/2015 b) Polls suggestion that SYRIZA will “win” in
the referendum on 02/07/2015. Variety in the form of context
revealed its challenge once again. It was suggested that the
dates were further researched and attributed weights in order to
become input parameters that the ANN could utilize.
In identifying socioeconomic events that attributed to the
change in the cash demand levels, the stage area was populated
with datasets from Greek newspapers headlines and news feeds
from Greek and international news agencies. The dates of the
events were cross-referenced with the dates of the “spikes” and
relative weights depending on the content of the feeds were
attributed
The results suggested that the approach was towards the
“right” direction since “steps” and “spikes” were accurately
calculated and depicted in the resulting values.
4.2.3. Model Development
The ANN model was a three layer Fitting Neural Network.
Fitting networks (see Figure 5 [31]) are feedforward neural
networks used to fit an input - output relationship [14].
The following section represents the parameters of the ANN.
In many cases several options were tested in attaining the best
result. The software used in defining, training and testing the
ANN was MatLab R2015b and the following parameters were
customized:
Training Epochs:
Definition: An epoch can be defined as one complete network
weights’ update (see Figure 6); for each epoch the learning
algorithm constructs a new model with different weights [15].
Values: The maximum number of epochs for the training
process was set to 1500, 3000, 5000 and 7000 epochs. This
was done to accommodate a large number of neurons in the
hidden area. In most cases the respective threshold (max
epochs) was never reached. In voiding overfitting, the early
stopping mechanism of MatLab was utilized based on the
Validation Set (see Sampling Division) with a failsafe of
consecutive failed epochs being half the maximum training
epochs.
Figure 3. Brute force attack
Figure 4. Dataset "Steps" & "Spikes"
Figure 5. Multi-Layer Network (3 Layers)
Figure 6. Back-propagation ANN epoch [15]
7
Performance Function:
Definition: The performance function is the way to measure
the effectiveness of the ANN; that is how well it is
accomplishing its tasks [16].
Values: The sse – Sum Squared Error, functions was used.
, were Yi is the actual value and Fi the
forecasted and n is the number of observations[17]. This
performance function was selected over
[18]
since the observations set used is always of the same size, n is
constant, and t is no needed for results normalization.
Input / Output Process Function:
Definition: All inputs and outputs must be scaled to fall within
a specific range 0 to -1 or -1 to 1 in order to comply with the
input requirements for the ANN. Input and output matrices are
normalized with the use of a function.
Values: The Input process function used was mapstd, which
processes the input data matrices by mapping each row’s
means to 0 and deviations to 1. The Output process function
used was mapminmax, which processes the output data
matrices by mapping minimum values to -1 and maximum
values to 1. These functions where used in order to scale
inputs and generate outputs without upper and lower limits
which distorted the predicted cash demand levels.
Transfer Function:
Definition: The transfer function is the function utilised in
calculating the output of a layer from its inputs.
Values: The selected transfer function employed was tansig,
hyperbolic tangent sigmoid. Once again this function was
selected in order not to impose upper and lower limits on the
outputs, which distorted the predicted cash demand levels.
Hidden Layer Size:
Definition: In their inception ANNs did not utilise hidden
layer(s) and it wasn’t realized until 1980s that the
incorporation of such layers could vastly enhance ANN
capabilities [19]. The preceptors in this layer allow the
network to learn non-linear functions [20].
Values: If too many hidden units are assigned, the ANN will
simply memorise the input patterns, whilst, if too few are
assigned the ANN might not represent all generalizations [20].
The range of 7-32 was utilized in executing the ANN. This
range was devised based on the following “rule of thumb”
[21]:
The optimal size is calculated based on the formula :
Increase the number of hidden neurons, one at a time,
until the generalization error begins to increase being
an indication of overfitting.
Training / Learning Function:
Definition: The learning algorithms are characterized from the
use of specific targets compared against the predicted values
and the adaptation of the respective weights in accordance to
this comparison [22].
Values: In the 1990s it was identified that the back-
propagation training algorithm leads to a better neural network
model [23]. The following three backpropagation algorithms
where tested:
trainlm, Levenberg-Marquardt
[14].
trainbfg, BFGS quasi-Newton
[14].
trainbr, Bayesian regularization.
Data Sampling Division:
Definition: In training an ANN the data should be divided in
three sets:
The Training set (Tr Set),
The Test set (Te Set),
The Validation set (V Set),
Values: The input dataset was a subset of the original dataset
in the stage area. A percent of the input set was attributed to
each set as follows:
4.2.4. Model Evaluation
In evaluating the accuracy of the ANN the following two
scoring methods were used:
Root Mean Square Percentage Error (RMSPE). The use of
this algorithm is quite common and can be an excellent
metric for general purpose predictions [24].
Mean Absolute Percentage Error (MAPE), is in many
studies the indicator of accuracy since it does not depend
on the series’ magnitude and unit of measurement [25].
Both methods were selected because they are quite popular
[26] in evaluating predictions against actual data. MAPE is
associated with a linear loss function while RMSPE is
consistent with a quadratic form which makes it more sensitive
to extremely poor forecasts [27]. MAPE is used in order to
identify the minimum deviation between the executions of the
ANN with different parameters and the RMSPE is utilised as
complementary measure in identifying single extreme
deviations.
The mathematical formulas for the calculation of the scoring
method are as follows [28]:
Figure 7. Tan-Sigmoid Transfer Function [14]
Tr Set
Te Set
V Set
100%
0%
0%
90%
0%
10%
80%
0%
20%
70%
0%
30%
60%
0%
40%
90%
5%
5%
80%
10%
10%
70%
15%
15%
60%
20%
20%
Table 1. ANN Data Sampling Sets
8
For j = 1,…,V where V is the number items in the set of
predictions which is equal to the number of items in the set of
the actual data.
Let Aj be the actual values of the demand cash levels as they
were recorder for the actual withdrawals made.
Let Sj be the predicted values of the demand cash levels
produced by the ANN.
Let Ej be the difference between Aj and Sj. ().
Equation 2. Mean Square Percentage Error
Equation 3. Mean Absolute Percentage Error
4.2.5. Model Results
The initial ANN implemented with MatLab had several input
parameters, 7000 maximum epochs and the number of hidden
neurons was 2/3 the input parameters, as shown in Figure 8.
The results produced after the training of the ANN showed
substantial improvement when the socioeconomic parameters
were assigned weights and contributed to the training (Figure 9
vs Figure 10). The results that follow one year’s data from one
ATM are represented in order to facilitate visualisation.
Due to the fact that same training parameters might produce
different results, based on the random initialization parameters
of the ANN, a safeguard loop was devised in order to minimize
erroneous executions by re-training in case the sse is greater
than 100% or the training epochs less than 20. To avoid an
endless loop, the training executions are limited to 20
repetitions.
Without taking into account the socioeconomic effects in the
model, the MAPSE≈
16
50.2849% and the RMSPE≈1661.0667%,
see Figure 9.
16
Consecutive execution with the same parameters and data
might produce slightly different results.
When the socioeconomic effects are integrated to the model
the MAPSE≈164.2288% and the RMSPE≈167.6755%, see
Figure 10.
Further analysis was performed with multiple training and test
executions of the ANN in identifying the best mix of ANN
parameters as defined in section “Model Development.” From
the 2280 executions performed the results identified high
variations in the result set, shown in Table 2.
The result set was filtered in identifying “runs” that had MAPE
and RMSPE in both training and testing cycles, lower than
40%. The 312 filtered results (88 for 1500 epochs, 87 for 3000
epochs, 73 for 5000 epochs and 64 for 7000 epochs) are
represented in the following 3D graphs; 1500 Epochs filtered
for 0%-40% MAPE & RMSP (Figure 11), 3000 Epochs
filtered for 0%-40% MAPE & RMSPE (Figure 12), Figure 13.
5000 Epochs filtered for 0%-40% MAPE & RMSPE (Figure
13) and 7000 Epochs filtered for 0%-40% MAPE & RMSPE
(Figure 14).
Figure 8. Sample ANN
Figure 9. Actual Vs Training without Socioeconomic
Figure 10. Actual Vs Training with Socioeconomic
Epochs
Train
Test
MAPE
RMSPE
MAPE
RMSPE
min
max
min
max
min
max
min
max
1500
1.00
311.82
3.66
446.15
14.58
336.83
17.49
357.43
3000
1.00
345.66
3.66
522.84
13.00
336.83
15.76
357.43
5000
1.00
356.41
3.66
495.99
10.57
336.83
15.07
357.43
7000
1.00
383.62
3.66
474.98
12.38
336.83
16.42
357.43
Table 2. Result Set Variations
9
The “best” MAPE and RMSPE executions can be viewed in
Table 3. Per number of Epoch, the following sets are
identified:
Mtrn, is the best training set based on MAPE ranking.
Rtrn, is the best training set based on RMSPE ranking.
Mtst, is the best test set based on MAPE ranking.
Rtst, is the best test set based on RMSPE ranking.
Mboth, is the best combination of training and test sets on
MAPE ranking.
Rboth, is the best combination of training and test sets on
RMSPE ranking.
PAll, is the best combination of training and test sets on
MAPE and RMSPE percentages.
Figure 11. 1500 Epochs filtered for 0%-40% MAPE & RMSPE
Figure 12. 3000 Epochs filtered for 0%-40% MAPE & RMSPE
Figure 13. 5000 Epochs filtered for 0%-40% MAPE & RMSPE
Figure 14. 7000 Epochs filtered for 0%-40% MAPE & RMSPE
Epochs
Set
Parameter
Value
Train
Test
MAPE
RMSPE
MAPE
RMSPE
1
5
0
0
Mtrn
Rtrn
Train Func.
TrainLM
1.00
3.66
298.56
325.82
Tr/Te/V
100/0/0
Hid. Neu/ns
30
Rtst
Train Func.
TrainBR
20.76
34.80
14.63
17.49
Tr/Te/V
70/0/30
Hid. Neu/ns
29
Mtst
Train Func.
TrainBR
20.26
33.51
14.58
17.70
Tr/Te/V
70/15/15
Hid. Neu/ns
31
Mboth
Rboth
PAll
Train Func.
TrainLM
13.89
20.68
18.99
22.18
Tr/Te/V
90/5/5
Hid. Neu/ns
24
3
0
0
0
Mtrn
Rtrn
Train Func.
TrainLM
1.00
3.66
336.72
357.34
Tr/Te/V
100/0/0
Hid. Neu/ns
26
Mtst
Rtst
Train Func.
TrainLM
24.86
46.48
12.99
15.76
Tr/Te/V
90/5/5
Hid. Neu/ns
13
Mboth
Rboth
Train Func.
TrainBR
12.33
19.10
19.18
24.00
Tr/Te/V
100/0/0
Hid. Neu/ns
28
PAll
Train Func.
TrainBFG
15.88
25.21
15.27
17.60
Tr/Te/V
100/0/0
Hid. Neu/ns
32
5
0
0
0
Mtrn
Rtrn
Train Func.
TrainLM
1.00
3.66
72.53
75.85
Tr/Te/V
100/0/0
Hid. Neu/ns
22
Mtst
Rtst
Train Func.
TrainBFG
25.28
44.65
10.57
15.07
Tr/Te/V
70/15/15
Hid. Neu/ns
22
Mboth
Train Func.
TrainLM
17.78
30.37
14.72
20.93
Tr/Te/V
90/0/10
Hid. Neu/ns
9
Rboth
Train Func.
TrainBFG
14.02
20.94
24.80
29.47
Tr/Te/V
100/0/0
Hid. Neu/ns
29
PAll
Train Func.
TrainBR
17.76
30.67
13.40
19.35
Tr/Te/V
70/15/15
Hid. Neu/ns
18
7
0
0
0
Mtrn
Rtrn
Train Func
TrainLM
1.00
3.66
82.08
82.22
Tr/Te/V
100/0/0
Hid. Neu/ns
29
Mtst
Train Func.
TrainBFG
32.95
53.52
12.38
19.05
Tr/Te/V
60/20/20
Hid. Neu/ns
7
Rtst
Train Func.
TrainBFG
31.32
46.91
13.33
16.42
Tr/Te/V
60/20/20
Hid. Neu/ns
19
Mboth
Train Func.
TrainLM
9.46
14.70
22.28
31.36
Tr/Te/V
100/0/0
Hid. Neu/ns
9
Rboth
Train Func.
TrainLM
13.52
20.65
23.32
26.86
Tr/Te/V
90/5/5
Hid. Neu/ns
28
Table 3. "Best Results"
10
Based on the “Best Results” for the identification of the mix of
parameters to be used in the ANN, 1000 ATMs’ data were
processed using the respective 18 parameter sets. Seven
months of historical data were fed to the training process and
testing was performed on the eighth month. The mean (),
median () and minimum error rates for each execution set are
shown in Table 4 and Figure 15.
The most prominent set of the ANN parameters in projecting
withdrawals’, shown in Table 5, would yield error levels
shown in Table 6.
5. CONCLUSION
5.1. Variety
The manifested Variety challenges, as identified, can be
classified as not very important since their resolution was
relatively easy. Nonetheless the time factor, which is of
essence in a POC
17
, was greatly burdened. A review of the
project’s execution timeline identified that almost half of the
time allocated was consumed in normalizing and “massaging”
the data lake. In case an EDD
18
existed, the terminology,
business concepts and respective calculations would have been
documented in it. In this way the EDD would have minimised
the effort of interpreting the data sets.
If metadata complemented the datasets, the issues imposed by
the variations in the representation (format and structure)
would have been easily identified and corrected in minimal
time. The metadata could have been generated in either of the
following processes:
Upon generation of the datasets though the ETL process.
Upon inception of the datasets in the staging repository,
were a mechanism would be responsible to automatically
identify the structure and format of each dataset.
17
Proof Of Concept
18
Enterprise Data Dictionary
Parameters
Train
Test
Ep.
Tr/Te/V
Train
Func
Hid.
Neur.
MAPE
RMSPE
MAPE
RMSPE
1
5
0
0
100/0/0
trainLM
30
43.74
131.93
281.57
478.26
2.08
9.33
158.93
216.99
min
0.60
2.18
65.77
70.44
70/0/30
trainBR
29
62.05
287.74
87.64
175.30
39.13
81.94
49.20
63.99
min
14.49
22.31
17.93
22.67
70/15/15
trainBR
31
62.50
292.62
95.14
216.25
39.33
80.81
48.60
62.95
min
14.21
23.99
16.08
20.03
90/5/5
trainLM
24
111.59
486.08
173.93
308.87
44.27
96.60
96.25
114.48
min
5.39
14.58
15.28
19.26
3
0
0
0
100/0/0
trainLM
26
44.18
121.89
314.66
551.51
2.11
9.25
169.83
228.32
min
0.48
1.72
45.14
54.44
90/5/5
trainLM
13
86.58
189.00
187.45
400.90
43.73
92.17
90.24
99.69
min
10.45
18.82
18.58
23.93
100/0/0
trainBR
28
38.44
189.00
110.91
257.07
23.88
49.50
60.32
81.59
min
10.58
16.41
18.17
22.23
100/0/0
trainBFG
32
42.53
189.0
165.58
294.30
26.50
54.33
87.11
116.73
min
14.26
21.55
18.42
20.78
5
0
0
0
100/0/0
trainLM
22
56.08
225.72
292.84
523.96
2.84
10.88
166.03
224.49
min
0.64
2.19
51.07
55.99
70/15/15
trainBFG
22
69.25
303.39
133.44
275.31
41.93
81.79
68.66
86.49
min
15.99
26.27
15.95
21.33
90/0/10
trainBR
9
63.74
304.59
74.71
156.23
38.76
80.71
42.51
55.26
min
14.28
21.66
16.26
20.48
100/0/0
trainBFG
29
40.51
176.32
170.93
297.07
24.28
49.36
95.01
127.35
min
12.79
20.62
23.49
29.94
70/15/15
trainBR
18
61.37
287.34
89.53
185.71
39.35
81.77
48.91
61.87
min
17.48
25.13
16.78
20.31
7
0
0
0
100/0/0
trainLM
29
36.77
121.89
323.47
618.44
1.98
8.80
176.11
239.61
min
0.48
1.72
46.67
54.81
60/20/20
trainBFG
7
69.83
254.11
131.75
239.86
48.11
85.99
76.89
89.19
min
22.08
32.92
16.52
20.31
60/20/20
trainBFG
19
70.26
298.35
121.66
213.71
43.17
86.81
72.22
91.11
min
17.17
23.41
14.38
19.32
100/0/0
trainLM
9
55.19
211.79
209.52
412.84
24.54
53.28
99.53
131.63
min
10.81
16.43
20.73
26.89
90/5/5
trainLM
28
100.04
422.24
190.15
362.36
43.54
97.00
95.88
114.54
min
7.61
14.61
17.22
24.64
Table 4. 1000 ATM's ANN execution
Figure 15. 1000 ATM’s ANN execution Stuck Bars graph
Parameter
Value
Epochs
3000
Training Data Set
100%
Test Data Set
0%
Validation Data Set
0%
Training Function
trainBR
Hidden Layer Neurons
28
Table 5.Best Execution ANN Parameters
Train
Test
MAPE
RMSPE
MAPE
RMSPE
38.44
189.00
110.91
257.07
23.88
49.50
60.32
81.59
min
10.58
16.41
18.17
22.23
Table 6.Best Execution Error Levels
11
The relations and correlations of data sets, instead of being
inferred, could have been documented in an ERD
19
thus
facilitating the “actual analysis” of the data.
Variety was identified and tackled in the project but with a
high cost in man-days and effort. Methodologies like EDD,
ERD and metadata, via ETL or specialized “acquiring
process,” could have minimized the time factor and enabled the
Data Scientist to focus on the actual task at hand, which was
the development and evaluation of the ANN.
5.2. Artificial Neural Network
In developing the ANN to predict ATM cash requirements it
was identified that by incorporating socioeconomic factors the
model would be substantially enhanced. The training process
was repeatedly executed aiming to identify the best set of
variables and ANN parameters.
In depicting historical data (training) the ANN accomplished
satisfactory error factors and in its best run (irrespectively of
test) it produced, a) MAPE in the area of 0.5% and b) RMSPE
in the area of 2%.
In predicting the “following month’s” cash requirements
(testing) the ANN produced low confidence results and in its
best run (irrespectively of training) it produced a) MAPE in the
area of 15% and b) RMSPE in the area of 19%.
Once training and test phase results are combined in evaluating
the ANN’s performance on the average of 1000 ATMs, the
prediction of the “following month’s” cash requirements
produced extremely low confidence results a) MAPE in the
area of 111% and b) RMSPE in the area of 257%. The mean
() errors are very high since there are ATMs exhibiting results
with extremely high error levels. In an attempt to smooth out
some of the extreme values effect on the mean (), the median
() could be utilized in suggesting a more coherent metric,
which shows relatively better confidence results a) MAPE in
the area of 60% and b) RMSPE in the area of 82%.
The error percentages are high, compared to the prediction
method currently employed a) MAPE in the area of 40% and
b) RMSPE in the area of 70% (see 3.2. ATM Cash
19
Entity Relationship Diagram
Management). Nonetheless with further analysis conducted in
identifying specific ATM parameter variations in order to
equalize outliers (for instance ATMs “00026”, “00232”,
“00355”, “00788” etc. see Figure 16), substantial improvement
could be achieved. These adjustments could be attributed to
local socioeconomic events (e.g. local fairs, agriculture
subsidies etc.) or seasonality effects (e.g. summer vs winter
tourism (monthly seasonality), local business salary day
preferences (intermonth seasonality) etc.).
Further evidence towards this, per ATM adjustment, is the fact
that “spikes” in performance can be identified in all ANN
executions for the specific ATM irrespectively of the
parameters. For instance:
ATM “00355” has the lowest performance in all (18/18)
ANN executions compared to all other ATMs. The
specific ATM is located in a Northern Greece town near
the borders with Turkey. This town’s prime characteristic
is, it’s proximity to army camps thus exhibiting
seasonality based on the transfer of recruits. This “army
seasonality” was not initially identified nor encoded into
the model’s input variables.
ATM “00788” has low performance in most (15/18) of
the ANN executions. This ATM is located in the island of
Lesvos (Mytilene), near the harbour. University of the
Aegean is located in Lesvos with ≈6000 students [29], thus
exhibiting “academic seasonality” which was not initially
identified nor encoded into the model’s input variables. In
Lesvos there is a strong army presence. In this case the
encoding is more complex since “academic seasonality”
should be combined with tourism seasonality and “army
seasonality.”
Once the performance of the ANN for all ATM’s is equalized
the expected performance should tend and even surpass the
minimum identified through the 1000 ATMs’ tests. The
minimum identified in the “best set” is 18% and 22% in MAPE
and RMSPE respectively, which would represent an
improvement, from the currently employed methodology, of
22% and 48% in MAPE and RMSPE respectively.
Another challenge identified was the time needed for training
of the ANN. For each ATMs ANN model, the time required to
calculate the model ranged from 30 seconds to 9 minutes. This
interval if multiplied by the thousands of ATMs, under
management by the institution, would yield execution times in
days. The frequency of execution / recalculation is closely
related to the “reaction time” required for the replenishments
which can be identified in 2-3 days based on the process
presented. Since this is a CPU
20
and RAM
21
intensive process,
the utilization of distributed computing offered by a Big Data
environment could point towards a possible solution.
20
Central Processing Unit
21
Random Access Memory
Figure 16. 1000 ATMs' "Best set" Outliers
12
In conclusion, unless performing a specific benchmark, which
was beyond the scope of the POC, it is questionable whether
the ANN for all ATMs could be fully recalculated (incremental
/ adaptive learning can be applied) daily or every other day, in
incorporating new actual data and socioeconomic events.
13
REFERENCES
[1] P. Kumar and E. Walia, “Cash Forecasting : An Application of
Artificial Neural Networks in Finance,” Int. J. Comput. Sci. Appl.,
vol. III, no. I, pp. 61–77, 2006.
[2] P. Baker, “Variety, not volume, biggest big data challenge in 2015 -
FierceBigData,” FireceBigData, 2015. [Online]. Available:
http://www.fiercebigdata.com/story/variety-not-volume-biggest-big-
data-challenge-2015/2015-01-14. [Accessed: 02-Sep-2015].
[3] D. Laney, “3D Data Management: Controlling Data Volume,
Velocity, and Variety,” Appl. Deliv. Strateg., vol. 949, p. 4, 2001.
[4] M. A. U. D. Khan, M. F. Uddin, and N. Gupta, “Seven V’s of Big
Data understanding Big Data to extract value,” Proc. 2014 Zo. 1
Conf. Am. Soc. Eng. Educ. - “Engineering Educ. Ind. Involv.
Interdiscip. Trends”, ASEE Zo. 1 2014, 2014.
[5] S. Manegold, M. L. Kersten, and P. Boncz, “Database architecture
evolution,” Proc. VLDB Endow., vol. 2, no. 2, pp. 1648–1653, 2009.
[6] G. Kouretas and P. Vlamis, “The Greek crisis: Causes and
implications,” Panoeconomicus, vol. 57, no. 4, pp. 391–404, 2010.
[7] M. Ashoka and D. Sandri, “Eurozone Crisis,” no. May 2010, 2012.
[8] S. Vassiliadis, D. Baboukardos, and P. Kotsovolos, “Is Basel III a
Panacea? Lessons from the Greek Sovereign Fiscal Crisis,” South
East Eur. J. Econ. Bus., vol. 7, no. 1, pp. 73–80, 2012.
[9] Bank Of Greece, “Monetary Policy,” no. November, 2011.
[10] I. Antoniadis, A. Alexandridis, and N. Sariannidis, “Mergers and
Acquisitions in the Greek Banking Sector: An Event Study of a
Proposal,” Procedia Econ. Financ., vol. 14, no. 14, pp. 13–22, 2014.
[11] G. Vemuganti, “Metadata Management in Big Data,” Big Data
Countering Tomorrow’s Challenges, vol. 11, no. 1, pp. 3–9, 2013.
[12] A. Elragal, “ERP and Big Data: The Inept Couple,” Procedia
Technol., vol. 16, pp. 242–249, 2014.
[13] G. Dallal, “The Behavior of the Sample Mean,” 2004. [Online].
Available: http://www.jerrydallal.com/lhsp/meandist.htm.
[Accessed: 02-Dec-2015].
[14] MathWorks, “MatLab Online Help.” 2015.
[15] J. G. Carney and P. Cunningham, “The Epoch Interpretation of
Learning,” p. 5, 1998.
[16] Philadelphia University, “Multi-Layer Feedforward Neural Networks
using matlab Part 1,” Philadelphia University, 2015. .
[17] C. Hamzaçebi, “Improving artificial neural networks’ performance in
seasonal time series forecasting,” Inf. Sci. (Ny)., vol. 178, no. 23, pp.
4550–4559, 2008.
[18] SAS Institute Inc., “Statistics of Fit,” SAS OnlineDoc®, 1999.
[Online]. Available:
http://www.okstate.edu/sas/v8/sashtml/ets/chap30/sect19.htm.
[Accessed: 07-Jan-2016].
[19] F. Wilczek, “Edge.org,” Edge.org, 2015. [Online]. Available:
https://edge.org/response-detail/10351. [Accessed: 04-Dec-2015].
[20] D. Klerfors, “Artificial Neural Networks,” Cornell University, 1998.
[Online]. Available:
http://www.cs.cornell.edu/courses/cs4700/2011fa/lectures/13_Ann.p
df.
[21] D. Y’barbo, “machine learning - multi-layer perceptron (MLP)
architecture: criteria for choosing number of hidden layers and size
of the hidden layer?,” Stack Overflow, 2012. [Online]. Available:
http://stackoverflow.com/questions/10565868/multi-layer-
perceptron-mlp-architecture-criteria-for-choosing-number-of-hidde.
[Accessed: 05-Jan-2016].
[22] F. Günther and S. Fritsch, “neuralnet: Training of neural networks,”
R J., vol. 2, no. 1, pp. 30–38, 2010.
[23] J. McCaffrey, “Understanding Neural Network Batch Training,”
Visual Studio Magazine, 2014. [Online]. Available:
https://visualstudiomagazine.com/articles/2014/08/01/batch-
training.aspx. [Accessed: 04-Dec-2015].
[24] Ihar, “Root Mean Squared Error,” Kaggle, 2015. [Online]. Available:
https://www.kaggle.com/wiki/RootMeanSquaredError. [Accessed:
03-Dec-2015].
[25] Agri4castWiki, “Methodology of forecast errors evaluation
methods,” MarsWiki, 2012. [Online]. Available:
http://marswiki.jrc.ec.europa.eu/agri4castwiki/index.php/Methodolog
y_of_forecast_errors_evaluation_methods. [Accessed: 03-Dec-
2015].
[26] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and
Practice. 2014.
[27] H. Song, S. Witt, and G. Li, The Advanced Econometrics of Tourism
Demand. Routledge, 2009.
[28] T. Fomby, “Scoring measures for prediction problems,” … Econ.
South. Methodist Univ. Dallas, …, pp. 1–3, 2008.
[29] Universtity of the Aegean, “Universtity of the Aegean Statistics,”
www.aegean.gr, 2015. [Online]. Available:
http://www.aegean.gr/aegean/en/statistics.htm. [Accessed: 15-Jan-
2016].
[30] D. Roberts, “Statistics 2 - Correlation Coefficient and Coefficient of
Determination,” MathBits.com, 2013. [Online]. Available:
http://mathbits.com/MathBits/TISection/Statistics2/correlation.htm.
[Accessed: 03-Dec-2015].
[31] R. J. Mooney, “CS 391L : Machine Learning Neural Networks
Neural Network Learning • Learning approach based on modeling
adaptation in biological neural systems . simple neural networks (
single layer ) for learning multi-layer neural networks • Synapses
change size and,” pp. 1–6, 2006.