BookPDF Available

Programming Neural Networks with Encog3 in Java

Authors:

Abstract and Figures

Encog is an advanced Machine Learning Framework for Java, C# and Silverlight. This book focuses on using the neural network capabilities of Encog with the Java programming language. This book begins with an introduction to the kinds of tasks neural networks are suited towards. The reader is shown how to use classification, regression and clustering to gain new insights into data. Neural network architectures such as feedforward, self organizing maps, NEAT, and recurrent neural networks are introduced. This book also covers advanced neural network training techniques such as back propagation, quick propagation, resilient propagation, Levenberg Marquardt, genetic training and simulated annealing. Real world problems such as financial prediction, classification and image processing are introduced.
Content may be subject to copyright.
Programming Neural Networks
with Encog3 in Java
Programming Neural Networks
with Encog3 in Java
Jeff Heaton
Heaton Research, Inc.
St. Louis, MO, USA
v
Publisher: Heaton Research, Inc
Programming Neural Networks with Encog 3 in Java
First Printing
October, 2011
Author: Jeff Heaton
Editor: WordsRU.com
Cover Art: Carrie Spear
ISBN’s for all Editions:
978-1-60439-021-6, Softcover
978-1-60439-022-3, PDF
978-1-60439-023-0, LIT
978-1-60439-024-7, Nook
978-1-60439-025-4, Kindle
Copyright ©2011 by Heaton Research Inc., 1734 Clarkson Rd. #107, Chester-
field, MO 63017-4976. World rights reserved. The author(s) created reusable
code in this publication expressly for reuse by readers. Heaton Research, Inc.
grants readers permission to reuse the code found in this publication or down-
loaded from our website so long as (author(s)) are attributed in any application
containing the reusable code and the source code itself is never redistributed,
posted online by electronic transmission, sold or commercially exploited as a
stand-alone product. Aside from this specific exception concerning reusable
code, no part of this publication may be stored in a retrieval system, trans-
mitted, or reproduced in any way, including, but not limited to photo copy,
photograph, magnetic, or other record, without prior agreement and written
permission of the publisher.
Heaton Research, Encog, the Encog Logo and the Heaton Research logo
are all trademarks of Heaton Research, Inc., in the United States and/or other
countries.
TRADEMARKS: Heaton Research has attempted throughout this book
to distinguish proprietary trademarks from descriptive terms by following the
capitalization style used by the manufacturer.
The author and publisher have made their best efforts to prepare this
book, so the content is based upon the final release of software whenever
vii
possible. Portions of the manuscript may be based upon pre-release versions
supplied by software manufacturer(s). The author and the publisher make no
representation or warranties of any kind with regard to the completeness or
accuracy of the contents herein and accept no liability of any kind including
but not limited to performance, merchantability, fitness for any particular
purpose, or any losses or damages of any kind caused or alleged to be caused
directly or indirectly from this book.
SOFTWARE LICENSE AGREEMENT: TERMS AND
CONDITIONS
The media and/or any online materials accompanying this book that are
available now or in the future contain programs and/or text files (the “Soft-
ware”) to be used in connection with the book. Heaton Research, Inc. hereby
grants to you a license to use and distribute software programs that make use
of the compiled binary form of this book’s source code. You may not redis-
tribute the source code contained in this book, without the written permission
of Heaton Research, Inc. Your purchase, acceptance, or use of the Software
will constitute your acceptance of such terms.
The Software compilation is the property of Heaton Research, Inc. unless
otherwise indicated and is protected by copyright to Heaton Research, Inc.
or other copyright owner(s) as indicated in the media files (the “Owner(s)”).
You are hereby granted a license to use and distribute the Software for your
personal, noncommercial use only. You may not reproduce, sell, distribute,
publish, circulate, or commercially exploit the Software, or any portion thereof,
without the written consent of Heaton Research, Inc. and the specific copyright
owner(s) of any component software included on this media.
In the event that the Software or components include specific license re-
quirements or end-user agreements, statements of condition, disclaimers, lim-
itations or warranties (“End-User License”), those End-User Licenses super-
sede the terms and conditions herein as to that particular Software component.
Your purchase, acceptance, or use of the Software will constitute your accep-
tance of such End-User Licenses.
By purchase, use or acceptance of the Software you further agree to comply
with all export laws and regulations of the United States as such laws and
regulations may exist from time to time.
viii
SOFTWARE SUPPORT
Components of the supplemental Software and any offers associated with
them may be supported by the specific Owner(s) of that material but they are
not supported by Heaton Research, Inc.. Information regarding any available
support may be obtained from the Owner(s) using the information provided
in the appropriate README files or listed elsewhere on the media.
Should the manufacturer(s) or other Owner(s) cease to offer support or
decline to honor any offer, Heaton Research, Inc. bears no responsibility. This
notice concerning support for the Software is provided for your information
only. Heaton Research, Inc. is not the agent or principal of the Owner(s), and
Heaton Research, Inc. is in no way responsible for providing any support for
the Software, nor is it liable or responsible for any support provided, or not
provided, by the Owner(s).
WARRANTY
Heaton Research, Inc. warrants the enclosed media to be free of physical
defects for a period of ninety (90) days after purchase. The Software is not
available from Heaton Research, Inc. in any other form or media than that
enclosed herein or posted to www.heatonresearch.com. If you discover a defect
in the media during this warranty period, you may obtain a replacement of
identical format at no charge by sending the defective media, postage prepaid,
with proof of purchase to:
Heaton Research, Inc.
Customer Support Department
1734 Clarkson Rd #107
Chesterfield, MO 63017-4976
Web: www.heatonresearch.com
E-Mail: support@heatonresearch.com
DISCLAIMER
Heaton Research, Inc. makes no warranty or representation, either ex-
pressed or implied, with respect to the Software or its contents, quality, per-
formance, merchantability, or fitness for a particular purpose. In no event will
ix
Heaton Research, Inc., its distributors, or dealers be liable to you or any other
party for direct, indirect, special, incidental, consequential, or other damages
arising out of the use of or inability to use the Software or its contents even
if advised of the possibility of such damage. In the event that the Software
includes an online update feature, Heaton Research, Inc. further disclaims
any obligation to provide this feature for any specific duration other than the
initial posting.
The exclusion of implied warranties is not permitted by some states. There-
fore, the above exclusion may not apply to you. This warranty provides you
with specific legal rights; there may be other rights that you may have that
vary from state to state. The pricing of the book with the Software by Heaton
Research, Inc. reflects the allocation of risk and limitations on liability con-
tained in this agreement of Terms and Conditions.
SHAREWARE DISTRIBUTION
This Software may use various programs and libraries that are distributed
as shareware. Copyright laws apply to both shareware and ordinary com-
mercial software, and the copyright Owner(s) retains all rights. If you try a
shareware program and continue using it, you are expected to register it. In-
dividual programs differ on details of trial periods, registration, and payment.
Please observe the requirements stated in appropriate files.
xi
This book is dedicated to my wonderful
wife, Tracy and our two cockatiels
Cricket and Wynton.
xiii
Contents
Introduction xxi
0.1 The History of Encog . . . . . . . . . . . . . . . . . . . . . . . xxi
0.2 Introduction to Neural Networks . . . . . . . . . . . . . . . . xxii
0.2.1 Neural Network Structure . . . . . . . . . . . . . . . . xxiv
0.2.2 A Simple Example . . . . . . . . . . . . . . . . . . . . xxvi
0.3 When to use Neural Networks . . . . . . . . . . . . . . . . . . xxvii
0.3.1 Problems Not Suited to a Neural Network Solution . . xxvii
0.3.2 Problems Suited to a Neural Network . . . . . . . . . . xxviii
0.4 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . xxviii
1 Regression, Classification & Clustering 1
1.1 Data Classification . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Clustering............................. 4
1.4 Structuring a Neural Network . . . . . . . . . . . . . . . . . . 4
1.4.1 Understanding the Input Layer . . . . . . . . . . . . . 5
1.4.2 Understanding the Output Layer . . . . . . . . . . . . 6
1.4.3 HiddenLayers....................... 7
1.5 Using a Neural Network . . . . . . . . . . . . . . . . . . . . . 8
1.5.1 The XOR Operator and Neural Networks . . . . . . . . 8
1.5.2 Structuring a Neural Network for XOR . . . . . . . . . 9
xiv CONTENTS
1.5.3 Training a Neural Network . . . . . . . . . . . . . . . . 13
1.5.4 Executing a Neural Network . . . . . . . . . . . . . . . 15
1.6 ChapterSummary ........................ 16
2 Obtaining Data for Encog 19
2.1 Where to Get Data for Neural Networks . . . . . . . . . . . . 19
2.2 NormalizingData......................... 20
2.2.1 Normalizing Numeric Values . . . . . . . . . . . . . . . 21
2.2.2 Normalizing Nominal Values . . . . . . . . . . . . . . . 23
2.2.3 Understanding One-of-n Normalization . . . . . . . . . 24
2.2.4 Understanding Equilateral Normalization . . . . . . . . 25
2.3 Programmatic Normalization . . . . . . . . . . . . . . . . . . 27
2.3.1 Normalizing Individual Numbers . . . . . . . . . . . . 27
2.3.2 Normalizing Memory Arrays . . . . . . . . . . . . . . . 28
2.4 Normalizing CSV Files . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Implementing Basic File Normalization . . . . . . . . . 30
2.4.2 Saving the Normalization Script . . . . . . . . . . . . . 31
2.4.3 Customizing File Normalization . . . . . . . . . . . . . 31
2.5 Summary ............................. 32
3 The Encog Workbench 35
3.1 Structure of the Encog Workbench . . . . . . . . . . . . . . . 36
3.1.1 Workbench CSV Files . . . . . . . . . . . . . . . . . . 37
3.1.2 Workbench EG Files . . . . . . . . . . . . . . . . . . . 37
3.1.3 Workbench EGA Files . . . . . . . . . . . . . . . . . . 38
3.1.4 Workbench EGB Files . . . . . . . . . . . . . . . . . . 38
3.1.5 Workbench Image Files . . . . . . . . . . . . . . . . . . 39
3.1.6 Workbench Text Files . . . . . . . . . . . . . . . . . . 39
3.2 A Simple XOR Example . . . . . . . . . . . . . . . . . . . . . 39
CONTENTS xv
3.2.1 Creating a New Project . . . . . . . . . . . . . . . . . 39
3.2.2 Generate Training Data . . . . . . . . . . . . . . . . . 40
3.2.3 Create a Neural Network . . . . . . . . . . . . . . . . . 41
3.2.4 Train the Neural Network . . . . . . . . . . . . . . . . 42
3.2.5 Evaluate the Neural Network . . . . . . . . . . . . . . 43
3.3 Using the Encog Analyst . . . . . . . . . . . . . . . . . . . . . 45
3.4 Encog Analyst Reports . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 RangeReport....................... 48
3.4.2 ScatterPlot........................ 48
3.5 Summary ............................. 49
4 Constructing Neural Networks in Java 51
4.1 Constructing a Neural Network . . . . . . . . . . . . . . . . . 52
4.2 The Role of Activation Functions . . . . . . . . . . . . . . . . 53
4.3 Encog Activation Functions . . . . . . . . . . . . . . . . . . . 54
4.3.1 ActivationBiPolar . . . . . . . . . . . . . . . . . . . . . 54
4.3.2 Activation Competitive . . . . . . . . . . . . . . . . . . 55
4.3.3 ActivationLinear . . . . . . . . . . . . . . . . . . . . . 56
4.3.4 ActivationLOG . . . . . . . . . . . . . . . . . . . . . . 57
4.3.5 ActivationSigmoid . . . . . . . . . . . . . . . . . . . . 58
4.3.6 ActivationSoftMax . . . . . . . . . . . . . . . . . . . . 59
4.3.7 ActivationTANH . . . . . . . . . . . . . . . . . . . . . 60
4.4 EncogPersistence......................... 61
4.5 Using Encog EG Persistence . . . . . . . . . . . . . . . . . . . 61
4.5.1 Using Encog EG Persistence . . . . . . . . . . . . . . . 62
4.6 Using Java Serialization . . . . . . . . . . . . . . . . . . . . . 64
4.7 Summary ............................. 66
5 Propagation Training 69
xvi CONTENTS
5.1 Understanding Propagation Training . . . . . . . . . . . . . . 70
5.1.1 Understanding Backpropagation . . . . . . . . . . . . . 71
5.1.2 Understanding the Manhattan Update Rule . . . . . . 72
5.1.3 Understanding Quick Propagation Training . . . . . . 73
5.1.4 Understanding Resilient Propagation Training . . . . . 74
5.1.5 Understanding SCG Training . . . . . . . . . . . . . . 75
5.1.6 Understanding LMA Training . . . . . . . . . . . . . . 76
5.2 Encog Method & Training Factories . . . . . . . . . . . . . . . 76
5.2.1 Creating Neural Networks with Factories . . . . . . . . 77
5.2.2 Creating Training Methods with Factories . . . . . . . 77
5.3 How Multithreaded Training Works . . . . . . . . . . . . . . . 78
5.4 Using Multithreaded Training . . . . . . . . . . . . . . . . . . 80
5.5 Summary ............................. 81
6 More Supervised Training 85
6.1 Running the Lunar Lander Example . . . . . . . . . . . . . . 87
6.2 Examining the Lunar Lander Simulator . . . . . . . . . . . . . 92
6.2.1 Simulating the Lander . . . . . . . . . . . . . . . . . . 92
6.2.2 Calculating the Score . . . . . . . . . . . . . . . . . . . 95
6.2.3 Flying the Spacecraft . . . . . . . . . . . . . . . . . . . 97
6.3 Training the Neural Pilot . . . . . . . . . . . . . . . . . . . . . 100
6.3.1 What is a Genetic Algorithm . . . . . . . . . . . . . . 101
6.3.2 Using a Genetic Algorithm . . . . . . . . . . . . . . . . 101
6.3.3 What is Simulated Annealing . . . . . . . . . . . . . . 103
6.3.4 Using Simulated Annealing . . . . . . . . . . . . . . . . 103
6.4 Using the Training Set Score Class . . . . . . . . . . . . . . . 104
6.5 Summary ............................. 105
7 Other Neural Network Types 109
CONTENTS xvii
7.1 The Elman Neural Network . . . . . . . . . . . . . . . . . . . 110
7.1.1 Creating an Elman Neural Network . . . . . . . . . . . 113
7.1.2 Training an Elman Neural Network . . . . . . . . . . . 113
7.2 The Jordan Neural Network . . . . . . . . . . . . . . . . . . . 115
7.3 The ART1 Neural Network . . . . . . . . . . . . . . . . . . . . 116
7.3.1 Using the ART1 Neural Network . . . . . . . . . . . . 117
7.4 The NEAT Neural Network . . . . . . . . . . . . . . . . . . . 120
7.4.1 Creating an Encog NEAT Population . . . . . . . . . . 121
7.4.2 Training an Encog NEAT Neural Network . . . . . . . 123
7.5 Summary ............................. 124
8 Using Temporal Data 127
8.1 How a Predictive Neural Network Works . . . . . . . . . . . . 128
8.2 Using the Encog Temporal Dataset . . . . . . . . . . . . . . . 129
8.3 Application to Sunspots . . . . . . . . . . . . . . . . . . . . . 131
8.4 Using the Encog Market Dataset . . . . . . . . . . . . . . . . 137
8.5 Application to the Stock Market . . . . . . . . . . . . . . . . . 139
8.5.1 Generating Training Data . . . . . . . . . . . . . . . . 140
8.5.2 Training the Neural Network . . . . . . . . . . . . . . . 141
8.5.3 Incremental Pruning . . . . . . . . . . . . . . . . . . . 143
8.5.4 Evaluating the Neural Network . . . . . . . . . . . . . 145
8.6 Summary ............................. 150
9 Using Image Data 153
9.1 Finding the Bounds . . . . . . . . . . . . . . . . . . . . . . . . 154
9.2 Downsampling an Image . . . . . . . . . . . . . . . . . . . . . 156
9.2.1 What to Do With the Output Neurons . . . . . . . . . 157
9.3 Using the Encog Image Dataset . . . . . . . . . . . . . . . . . 157
9.4 Image Recognition Example . . . . . . . . . . . . . . . . . . . 159
xviii CONTENTS
9.4.1 Creating the Training Set . . . . . . . . . . . . . . . . 160
9.4.2 Inputting an Image . . . . . . . . . . . . . . . . . . . . 161
9.4.3 Creating the Network . . . . . . . . . . . . . . . . . . . 163
9.4.4 Training the Network . . . . . . . . . . . . . . . . . . . 164
9.4.5 Recognizing Images . . . . . . . . . . . . . . . . . . . . 167
9.5 Summary ............................. 168
10 Using a Self-Organizing Map 171
10.1 The Structure and Training of a SOM . . . . . . . . . . . . . . 173
10.1.1 Structuring a SOM . . . . . . . . . . . . . . . . . . . . 173
10.1.2 Training a SOM . . . . . . . . . . . . . . . . . . . . . . 174
10.1.3 Understanding Neighborhood Functions . . . . . . . . 176
10.1.4 Forcing a Winner . . . . . . . . . . . . . . . . . . . . . 178
10.1.5 Calculating Error . . . . . . . . . . . . . . . . . . . . . 179
10.2 Implementing the Colors SOM in Encog . . . . . . . . . . . . 179
10.2.1 Displaying the Weight Matrix . . . . . . . . . . . . . . 179
10.2.2 Training the Color Matching SOM . . . . . . . . . . . 181
10.3Summary ............................. 184
A Installing and Using Encog 187
A.1 InstallingEncog.......................... 188
A.2 Compiling the Encog Core . . . . . . . . . . . . . . . . . . . . 189
A.3 Compiling and Executing Encog Examples . . . . . . . . . . . 191
A.3.1 Running an Example from the Command Line . . . . . 191
Glossary 193
CONTENTS xix
xxi
Introduction
Encog is a machine learning framework for Java and .NET. Initially, Encog was
created to support only neural networks. Later versions of Encog expanded
more into general machine learning. However, this book will focus primarily on
neural networks. Many of the techniques learned in this book can be applied
to other machine learning techniques. Subsequent books will focus on some of
these areas of Encog programming.
This book is published in conjunction with the Encog 3.0 release and should
stay very compatible with later versions of Encog 3. Future versions in the 3.x
series will attempt to add functionality with minimal disruption to existing
code.
0.1 The History of Encog
The first version of Encog, version 0.5, was released on July 10, 2008. Encog’s
original foundations include some code used in the first edition of “Introduction
to Neural Networks with Java,” published in 2005. Its second edition featured
a completely redesigned neural network engine, which became Encog version
0.5. Encog versions 1.0 through 2.0 greatly enhanced the neural network code
well beyond what can be covered in an introduction book. Encog version 3.0
added more formal support for machine learning methods beyond just neural
networks.
This book will provide a comprehensive instruction on how to use neu-
ral networks with Encog. For the intricacies of actually implementing neural
networks, reference “Introduction to Neural Networks with Java” and “Intro-
duction to Neural Networks with C#. These books explore how to implement
xxii Introduction
basic neural networks and now to create the internals of a neural network.
These two books can be read in sequence as new concepts are introduced
with very little repetition. These books are not a prerequisite to each other.
This book will equip you to start with Encog if you have a basic understanding
Java programming language. Particularly, you should be familiar with the
following:
Java Generics
Collections
Object Oriented Programming
Before we begin examining how to use Encog, let’s first identify the problems
Encog adept at solving. Neural networks are a programming technique. They
are not a silver bullet solution for every programming problem, yet offer vi-
able solutions to certain programming problems. Of course, there are other
problems for which neural networks are a poor fit.
0.2 Introduction to Neural Networks
This book will define a neural network and how it is used. Most people, even
non-programmers, have heard of neural networks. There are many science
fiction overtones associated with neural networks. And, like many things, sci-
fi writers have created a vast, but somewhat inaccurate, public idea of what a
neural network is.
Most laypeople think of neural networks as a sort of “artificial brain” that
power robots or carry on intelligent conversations with human beings. This
notion is a closer definition of Artificial Intelligence (AI) than neural networks.
AI seeks to create truly intelligent machines. I am not going to waste several
paragraphs explaining what true, human intelligence is, compared to the cur-
rent state of computers. Anyone who has spent any time with both human
beings and computers knows the difference. Current computers are not intel-
ligent.
0.2 Introduction to Neural Networks xxiii
Neural networks are one small part of AI. Neural networks, at least as they
currently exist, carry out very small, specific tasks. Computer-based neural
networks are not general purpose computation devices like the human brain. It
is possible that the perception of neural networks is skewed, as the brain itself
is a network of neurons, or a neural network. This brings up an important
distinction.
The human brain is more accurately described as a biological neural net-
work (BNN). This book is not about biological neural networks. This book
is about artificial neural networks (ANN). Most texts do not make the dis-
tinction between the two. Throughout this text, references to neural networks
imply artificial neural networks.
There are some basic similarities between biological neural networks and
artificial neural networks. Artificial neural networks are largely mathematical
constructs that were inspired by biological neural networks. An important
term that is often used to describe various artificial neural network algorithms
is “biological plausibility. This term defines how close an artificial neural
network algorithm is to a biological neural network.
As stated earlier, neural networks are designed to accomplish one small
task. A full application likely uses neural networks to accomplish certain
parts of its objectives. The entire application is not be implemented as a
neural network. The application may be made of several neural networks,
each designed for a specific task.
The neural networks accomplish pattern recognition tasking very well.
When communicated a pattern, a neural network communicates that pattern
back. At the highest level, this is all that a typical neural network does. Some
network architectures will vary this, but the vast majority of neural networks
work this way. Figure 1 illustrates a neural network at this level.
Figure 1: A Typical Neural Network
As you can see, the neural network above is accepting a pattern and return-
ing a pattern. Neural networks operate completely synchronously. A neural
xxiv Introduction
network will only output when presented with input. It is not like a human
brain, which does not operate exactly synchronously. The human brain re-
sponds to input, but it will produce output anytime it desires!
0.2.1 Neural Network Structure
Neural networks are made of layers of similar neurons. At minimum, most
neural networks consist of an input layer and output layer. The input pattern
is presented to the input layer. Then the output pattern is returned from
the output layer. What happens between the input and output layers is a
black box. At this point in the book, the neural network’s internal structure
is not yet a concern. There are many architectures that define interactions
between the input and output layer. Later in this book, these architectures
are examined.
The input and output patterns are both arrays of floating point numbers.
An example of these patterns follows.
Neural Netwo rk I n put : [ 0 . 24 5 , . 2 8 3 , 0 . 0 ]
N eu r a l Ne tw or k O ut put : [ 0 . 7 8 2 , 0 . 5 43 ]
The neural network above has three neurons in the input layer and two neurons
in the output layer. The number of neurons in the input and output layers
do not change. As a result, the number of elements in the input and output
patterns, for a particular neural network, can never change.
To make use of the neural network, problem input must be expressed as
an array of floating point numbers. Likewise, the problem’s solution must be
an array of floating point numbers. This is the essential and only true value
of a neural network. Neural networks take one array and transform it into
a second. Neural networks do not loop, call subroutines, or perform any of
the other tasks associated with traditional programming. Neural networks
recognize patterns.
0.2 Introduction to Neural Networks xxv
A neural network is much like a hash table in traditional programming.
A hash table is used to map keys to values, somewhat like a dictionary. The
following could be thought of as a hash table:
“hear” ->“to perceive or apprehend by the ear”
“run” ->“to go faster than a walk”
“write” ->“to form (as characters or symbols) on a surface with an
instrument (as a pen)”
This is a mapping between words and the definition of each word, or a hash
table just as in any programming language. It uses a string key to another
value of a string. Input is the dictionary with a key and output is a value.
This is how most neural networks function. One neural network called a
Bidirectional Associative Memory (BAM) actually allows a user to also pass
in the value and receive the key.
Hash tables use keys and values. Think of the pattern sent to the neural
network’s input layer as the key to the hash table. Likewise, think of the
value returned from the hash table as the pattern returned from the neural
network’s output layer. The comparison between a hash table and a neural
network works well; however, the neural network is much more than a hash
table.
What would happen with the above hash table if a word was passed that
was not a map key? For example, pass in the key “wrote. A hash table
would return null or indicate in some way that it could not find the specified
key. Neural networks do not return null, but rather find the closest match.
Not only do they find the closest match, neural networks modify the output
to estimate the missing value. So if “wrote” is passed to the neural network
above, the output would likely be “write. There is not enough data for the
neural network to have modified the response, as there are only three samples.
So you would likely get the output from one of the other keys.
The above mapping brings up one very important point about neural net-
works. Recall that neural networks accept an array of floating point numbers
and return another array. How would strings be put into the neural network
xxvi Introduction
as seen above? While there are ways to do this, it is much easier to deal with
numeric data than strings.
With a neural network problem, inputs must be arrays of floating point
numbers. This is one of the most difficult aspects of neural network pro-
gramming. How are problems translated into a fixed-length array of floating
point numbers? The best way is by demonstration. Examples are explored
throughout the remainder of this introduction.
0.2.2 A Simple Example
Most basic literature concerning neural networks provide examples with the
XOR operator. The XOR operator is essentially the “Hello World” of neural
network programming. While this book will describe scenarios much more
complex than XOR, the XOR operator is a great introduction.
To begin, view the XOR operator as though it were a hash table. XOR
operators work similar to the AND and OR operators. For an AND to be
true, both sides must be true. For an OR to be true, either side must be
true. For an XOR to be true, both of the sides must be different from each
other. The truth table for an XOR is as follows.
Fa l s e XOR Fa l se = F al s e
True XOR F al s e = Tr ue
Fa l s e XOR Tr ue = True
True XOR True = F a l se
To continue the hash table example, the above truth table would be repre-
sented as follows.
[ 0 . 0 , 0 . 0 ] >[ 0 . 0 ]
[ 1 . 0 , 0 . 0 ] >[ 1 . 0 ]
[ 0 . 0 , 1 . 0 ] >[ 1 . 0 ]
[ 1 . 0 , 1 . 0 ] >[ 0 . 0 ]
These mapping show input and the ideal expected output for the neural net-
work.
0.3 When to use Neural Networks xxvii
0.3 When to use Neural Networks
With neural networks defined, it must be determined when or when not to use
them. Knowing when not to use something is just as important as knowing
how to use it. To understand these objectives, we will identify what sort of
problems Encog is adept at solving.
A significant goal of this book is explain how to construct Encog neural
networks and when to use them. Neural network programmers must under-
stand which problems are well-suited for neural network solutions and which
are not. An effective neural network programmer also knows which neural
network structure, if any, is most applicable to a given problem. This sec-
tion begins by identifying which problems that are not conducive to a neural
network solution.
0.3.1 Problems Not Suited to a Neural Network Solu-
tion
Programs that are easily written as flowcharts are not ideal applications for
neural networks. If your program consists of well-defined steps, normal pro-
gramming techniques will suffice.
Another criterion to consider is whether program logic is likely to change.
One of the primary features of neural networks is the ability to learn. If the
algorithm used to solve your problem is an unchanging business rule, there is no
reason to use a neural network. In fact, a neural network might be detrimental
to your application if it attempts to find a better solution and begins to diverge
from the desired process. Unexpected results will likely occur.
Finally, neural networks are often not suitable for problems that require
a clearly traceable path to solution. A neural network can be very useful for
solving the problem for which it was trained, but cannot explain its reasoning.
The neural network knows something because it was trained to know it. How-
ever, a neural network cannot explain the series of steps followed to derive the
answer.
xxviii Introduction
0.3.2 Problems Suited to a Neural Network
Although there are many problems for which neural networks are not well
suited, there are also many problems for which a neural network solution is
quite useful. In addition, neural networks can often solve problems with fewer
lines of code than traditional programming algorithms. It is important to
understand which problems call for a neural network approach.
Neural networks are particularly useful for solving problems that cannot
be expressed as a series of steps. This may include recognizing patterns, clas-
sification, series prediction and data mining.
Pattern recognition is perhaps the most common use for neural networks.
For this type of problem, the neural network is presented a pattern in the
form of an image, a sound or other data. The neural network then attempts
to determine if the input data matches a pattern that it has been trained to
recognize. The remainder of this textbook will examine many examples of how
to use neural networks to recognize patterns.
Classification is a process that is closely related to pattern recognition. A
neural network trained for classification is designed to classify input samples
into groups. These groups may be fuzzy and lack clearly-defined boundaries.
Alternatively, these groups may have quite rigid boundaries.
0.4 Structure of the Book
This book begins with Chapter 1, “Regression, Classification & Clustering.
This chapter introduces the major tasks performed with neural networks.
These tasks are not just performed by neural networks, but also by many
other machine learning methods as well.
One of the primary tasks for neural networks is to recognize and provide
insight into data. Chapter 2, “Obtaining Data & Normalization,” shows how
to process this data before using a neural network. This chapter will examine
some data that might be used with a neural network and how to normalize
and use this data with a neural network.
Encog includes a GUI neural network editor called the Encog Workbench.
Chapter 3, “Using the Encog Workbench,” details the best methods and uses
0.4 Structure of the Book xxix
for this application. The Encog Workbench provides a GUI tool that can edit
the .EG data files used by the Encog Framework. The powerful Encog Analyst
can also be used to automate many tasks.
The next step is to construct and save neural networks. Chapter 4, “Con-
structing Neural Networks in Java,” shows how to create neural networks using
layers and activation functions. It will also illustrate how to save neural net-
works to either platform-independent .EG files or standard Java serialization.
Neural networks must be trained for effective utilization and there are sev-
eral ways to perform this training. Chapter 5, “Propagation Training,” shows
how to use the propagation methods built into Encog to train neural networks.
Encog supports backpropagation, resilient propagation, the Manhattan update
rule, Quick Propagation and SCG.
Chapter 6, “Other Supervised Training Methods,” shows other supervised
training algorithms supported by Encog. This chapter introduces simulated
annealing and genetic algorithms as training techniques for Encog networks.
Chapter 6 also details how to create hybrid training algorithms.
Feedforward neural networks are not the only type supported by Encog.
Chapter 7, “Other Neural Network Types,” provides a brief introduction to
several other neural network types that Encog supports well. Chapter 7 de-
scribes how to setup NEAT, ART1 and Elman/Jordan neural networks.
Neural networks are commonly used to predict future data changes. One
common use for this is to predict stock market trends. Chapter 8, “Using
Temporal Data,” will show how to use Encog to predict trends.
Images are frequently used as an input for neural networks. Encog contains
classes that make it easy to use image data to feed and train neural networks.
Chapter 9, “Using Image Data,” shows how to use image data with Encog.
Finally, Chapter 10, “Using Self Organizing Maps,” expands beyond su-
pervised training to explain how to use unsupervised training with Encog. A
Self Organizing Map (SOM) can be used to cluster data.
xxx Introduction
As you read though this book you will undoubtedly have questions about
the Encog Framework. Your best resources are the Encog forums at Heaton
Research, found at the following URL.
http://www.heatonresearch.com/forum
Additionally, the Encog Wiki, located at the following URL.
http://www.heatonresearch.com/wiki/Main_Page
0.4 Structure of the Book xxxi
1
Chapter 1
Regression, Classification &
Clustering
Classifying Data
Regression Analysis of Data
Clustering Data
How Machine Learning Problems are Structured
While there are other models, regression, classification and clustering are the
three primary ways that data is evaluated for machine learning problems.
These three models are the most common and the focus of this book. The
next sections will introduce you to classification, regression and clustering.
1.1 Data Classification
Classification attempts to determine what class the input data falls into. Clas-
sification is usually a supervised training operation, meaning the user provides
data and expected results to the neural network. For data classification, the
expected result is identification of the data class.
2 Regression, Classification & Clustering
Supervised neural networks are always trained with known data. During
training, the networks are evaluated on how well they classify known data.
The hope is that the neural network, once trained, will be able to classify
unknown data as well.
Fisher’s Iris Dataset is an example of classification. This is a dataset that
contains measurements of Iris flowers. This is one of the most famous datasets
and is often used to evaluate machine learning methods. The full dataset is
available at the following URL.
http://www.heatonresearch.com/wiki/Iris Data Set
Below is small sampling from the Iris data set.
” S ep a l Len gt h , ” S e pa l Width ” , ” P e ta l Le ng th ” , ” Pe t al Width ” , ” S p e ci es
5 . 1 , 3 . 5 , 1 . 4 , 0 . 2 , ” s e t o s a ”
4 . 9 , 3 . 0 , 1 . 4 , 0 . 2 , ” s e t o s a ”
4 . 7 , 3 . 2 , 1 . 3 , 0 . 2 , ” s e t o s a ”
. . .
7 . 0 , 3 . 2 , 4 . 7 , 1 . 4 , ” v e r s i c o l o r
6 . 4 , 3 . 2 , 4 . 5 , 1 . 5 , ” v e r s i c o l o r
6 . 9 , 3 . 1 , 4 . 9 , 1 . 5 , ” v e r s i c o l o r
. . .
6 . 3 , 3 . 3 , 6 . 0 , 2 . 5 , ” v i r g i n i c a
5 . 8 , 2 . 7 , 5 . 1 , 1 . 9 , ” v i r g i n i c a
7 . 1 , 3 . 0 , 5 . 9 , 2 . 1 , ” v i r g i n i c a
The above data is shown as a CSV file. CSV is a very common input format
for a neural network. The first row is typically a definition for each of the
columns in the file. As you can see, for each of the flowers there are five pieces
of information are provided.
Sepal Length
Sepal Width
Petal Length
Petal Width
Species
1.2 Regression Analysis 3
For classification, the neural network is instructed that, given the sepal length-
/width and the petal length/width, the species of the flower can be determined.
The species is the class.
A class is usually a non-numeric data attribute and as such, membership in
the class must be well-defined. For the Iris data set, there are three different
types of Iris. If a neural network is trained on three types of Iris, it cannot be
expected to identify a rose. All members of the class must be known at the
time of training.
1.2 Regression Analysis
In the last section, we learned how to use data to classify data. Often the
desired output is not simply a class, but a number. Consider the calculation
of an automobile’s miles per gallon (MPG). Provided data such as the engine
size and car weight, the MPG for the specified car may be calculated.
Consider the following sample data for five cars:
”mpg” , ” c y l i n d e r s ” , d i s pl a c em e n t ” , ” h o rs e po w er ” , ” w e ig h t ” ,
a c c e l e r a t i o n ” , ” mo de l y ea r ” , ” o r i g i n ” , ” c a r name ”
1 8 . 0 , 8 , 3 0 7 . 0 , 1 3 0 . 0 , 3 5 0 4 . , 1 2 . 0 , 7 0 , 1 , ” c h e v r o l e t c h e v e l l e m a li bu ”
1 5 . 0 , 8 , 3 5 0 . 0 , 1 6 5 . 0 , 3 6 9 3 . , 1 1 . 5 , 7 0 , 1 , ” b u ic k s k y l a r k 3 20 ”
1 8 . 0 , 8 , 3 1 8 . 0 , 1 5 0 . 0 , 3 4 3 6 . , 1 1 . 0 , 7 0 , 1 , ” pl ym ou th s a t e l l i t e ”
1 6 . 0 , 8 , 3 0 4 . 0 , 1 5 0 . 0 , 3 4 3 3 . , 1 2 . 0 , 7 0 , 1 , ”a mc r e b e l s s t
1 7 . 0 , 8 , 3 0 2 . 0 , 1 4 0 . 0 , 3 4 4 9 . , 1 0 . 5 , 7 0 , 1 , ” f o r d t o r i n o
. . .
For more information, the entirety of this dataset may be found at:
http://www.heatonresearch.com/wiki/MPG_Data_Set
The idea of regression is to train the neural network with input data about
the car. However, using regression, the network will not produce a class. The
neural network is expected to provide the miles per gallon that the specified
car would likely get.
It is also important to note that not use every piece of data in the above file
will be used. The columns “car name” and “origin” are not used. The name of a
car has nothing to do with its fuel efficiency and is therefore excluded. Likewise
the origin does not contribute to this equation. The origin is a numeric value
4 Regression, Classification & Clustering
that specifies what geographic region the car was produced in. While some
regions do focus on fuel efficiency, this piece of data is far too broad to be
useful.
1.3 Clustering
Another common type of analysis is clustering. Unlike the previous two anal-
ysis types, clustering is typically unsupervised. Either of the datasets from
the previous two sections could be used for clustering. The difference is that
clustering analysis would not require the user to provide the species in the case
of the Iris dataset, or the MPG number for the MPG dataset. The clustering
algorithm is expected to place the data elements into clusters that correspond
to the species or MPG.
For clustering, the machine learning method simply looks at the data and
attempts to place that data into a number of clusters. The number of clusters
expected must be defined ahead of time. If the number of clusters changes,
the clustering machine learning method will need to be retrained.
Clustering is very similar to classification, with its output being a cluster,
which is similar to a class. However, clustering differs from regression as it does
not provide a number. So if clustering were used with the MPG dataset, the
output would need to be a cluster that the car falls into. Perhaps each cluster
would specify the varying level of fuel efficiency for the vehicle. Perhaps the
clusters would group the cars into clusters that demonstrated some relationship
that had not yet been noticed.
1.4 Structuring a Neural Network
Now the three major problem models for neural networks are identified, it is
time to examine how data is actually presented to the neural network. This
section focuses mainly on how the neural network is structured to accept data
items and provide output. The following chapter will detail how to normalize
the data prior to being presented to the neural network.
Neural networks are typically layered with an input and output layer at
1.4 Structuring a Neural Network 5
minimum. There may also be hidden layers. Some neural network types
are not broken up into any formal layers beyond the input and output layer.
However, the input layer and output layer will always be present and may be
incorporated in the same layer. We will now examine the input layer, output
layer and hidden layers.
1.4.1 Understanding the Input Layer
The input layer is the first layer in a neural network. This layer, like all layers,
contains a specific number of neurons. The neurons in a layer all contain similar
properties. Typically, the input layer will have one neuron for each attribute
that the neural network will use for classification, regression or clustering.
Consider the previous examples. The Iris dataset has four input neurons.
These neurons represent the petal width/length and the sepal width/length.
The MPG dataset has more input neurons. The number of input neurons
does not always directly correspond to the number of attributes and some
attributes will take more than one neuron to encode. This encoding process,
called normalization, will be covered in the next chapter.
The number of neurons determines how a layer’s input is structured. For
each input neuron, one double value is stored. For example, the following
array could be used as input to a layer that contained five neurons.
double [ ] in p ut = new double [ 5 ] ;
The input to a neural network is always an array of the type double. The size
of this array directly corresponds to the number of neurons on the input layer.
Encog uses the MLData interface to define classes that hold these arrays.
The array above can be easily converted into an MLData object with the
following line of code.
MLData dat a = new BasicMLData ( input ) ;
The MLData interface defines any “array like” data that may be presented
to Encog. Input must always be presented to the neural network inside of a
MLData object. The BasicMLData class implements the MLData inter-
face. However, the BasicMLData class is not the only way to provide Encog
6 Regression, Classification & Clustering
with data. Other implementations of MLData are used for more specialized
types of data.
The BasicMLData class simply provides a memory-based data holder
for the neural network data. Once the neural network processes the input, a
MLData-based class will be returned from the neural network’s output layer.
The output layer is discussed in the next section.
1.4.2 Understanding the Output Layer
The output layer is the final layer in a neural network. This layer provides the
output after all previous layers have processed the input. The output from
the output layer is formatted very similarly to the data that was provided to
the input layer. The neural network outputs an array of doubles.
The neural network wraps the output in a class based on the MLData
interface. Most of the built-in neural network types return a BasicMLData
class as the output. However, future and third party neural network classes
may return different classes based other implementations of the MLData
interface.
Neural networks are designed to accept input (an array of doubles) and then
produce output (also an array of doubles). Determining how to structure the
input data and attaching meaning to the output are the two main challenges
of adapting a problem to a neural network. The real power of a neural network
comes from its pattern recognition capabilities. The neural network should be
able to produce the desired output even if the input has been slightly distorted.
Regression neural networks typically produce a single output neuron that
provides the numeric value produced by the neural network. Multiple output
neurons may exist if the same neural network is supposed to predict two or
more numbers for the given inputs.
Classification produce one or more output neurons, depending on how the
output class was encoded. There are several different ways to encode classes.
This will be discussed in greater detail in the next chapter.
Clustering is setup similarly as the output neurons identify which data
belongs to what cluster.
1.4 Structuring a Neural Network 7
1.4.3 Hidden Layers
As previously discussed, neural networks contain and input layer and an output
layer. Sometimes the input layer and output layer are the same, but are most
often two separate layers. Additionally, other layers may exist between the
input and output layers and are called hidden layers. These hidden layers are
simply inserted between the input and output layers. The hidden layers can
also take on more complex structures.
The only purpose of the hidden layers is to allow the neural network to
better produce the expected output for the given input. Neural network pro-
gramming involves first defining the input and output layer neuron counts.
Once it is determined how to translate the programming problem into the
input and output neuron counts, it is time to define the hidden layers.
The hidden layers are very much a “black box. The problem is defined in
terms of the neuron counts for the hidden and output layers. How the neural
network produces the correct output is performed in part by hidden layers.
Once the structure of the input and output layers is defined, the hidden layer
structure that optimally learns the problem must also be defined.
The challenge is to avoid creating a hidden structure that is either too
complex or too simple. Too complex of a hidden structure will take too long
to train. Too simple of a hidden structure will not learn the problem. A good
starting point is a single hidden layer with a number of neurons equal to twice
the input layer. Depending on this network’s performance, the hidden layer’s
number of neurons is either increased or decreased.
Developers often wonder how many hidden layers to use. Some research
has indicated that a second hidden layer is rarely of any value. Encog is an
excellent way to perform a trial and error search for the most optimal hidden
layer configuration. For more information see the following URL:
http://www.heatonresearch.com/wiki/Hidden_Layers
8 Regression, Classification & Clustering
Some neural networks have no hidden layers, with the input layer directly
connected to the output layer. Further, some neural networks have only a
single layer in which the single layer is self-connected. These connections
permit the network to learn. Contained in these connections, called synapses,
are individual weight matrixes. These values are changed as the neural network
learns. The next chapter delves more into weight matrixes.
1.5 Using a Neural Network
This section will detail how to structure a neural network for a very simple
problem: to design a neural network that can function as an XOR operator.
Learning the XOR operator is a frequent “first example” when demonstrating
the architecture of a new neural network. Just as most new programming
languages are first demonstrated with a program that simply displays “Hello
World,” neural networks are frequently demonstrated with the XOR operator.
Learning the XOR operator is sort of the “Hello World” application for neural
networks.
1.5.1 The XOR Operator and Neural Networks
The XOR operator is one of common Boolean logical operators. The other two
are the AND and OR operators. For each of these logical operators, there are
four different combinations. All possible combinations for the AND operator
are shown below.
0 AND 0 = 0
1 AND 0 = 0
0 AND 1 = 0
1 AND 1 = 1
This should be consistent with how you learned the AND operator for com-
puter programming. As its name implies, the AND operator will only return
true, or one, when both inputs are true.
1.5 Using a Neural Network 9
The OR operator behaves as follows:
0 OR 0 = 0
1 OR 0 = 1
0 OR 1 = 1
1 OR 1 = 1
This also should be consistent with how you learned the OR operator for
computer programming. For the OR operator to be true, either of the inputs
must be true.
The “exclusive or” (XOR) operator is less frequently used in computer
programming. XOR has the same output as the OR operator, except for the
case where both inputs are true. The possible combinations for the XOR
operator are shown here.
0 XOR 0 = 0
1 XOR 0 = 1
0 XOR 1 = 1
1 XOR 1 = 0
As you can see, the XOR operator only returns true when both inputs differ.
The next section explains how to structure the input, output and hidden layers
for the XOR operator.
1.5.2 Structuring a Neural Network for XOR
There are two inputs to the XOR operator and one output. The input and
output layers will be structured accordingly. The input neurons are fed the
following double values:
0.0 ,0.0
1.0 ,0.0
0.0 ,1.0
1.0 ,1.0
These values correspond to the inputs to the XOR operator, shown above.
The one output neuron is expected to produce the following double values:
0 . 0
1 . 0
10 Regression, Classification & Clustering
1 . 0
0 . 0
This is one way that the neural network can be structured. This method
allows a simple feedforward neural network to learn the XOR operator. The
feedforward neural network, also called a perceptron, is one of the first neural
network architectures that we will learn.
There are other ways that the XOR data could be presented to the neural
network. Later in this book, two examples of recurrent neural networks will be
explored including Elman and Jordan styles of neural networks. These meth-
ods would treat the XOR data as one long sequence, basically concatenating
the truth table for XOR together, resulting in one long XOR sequence, such
as:
0 . 0 , 0 . 0 , 0 . 0 ,
0 . 0 , 1 . 0 , 1 . 0 ,
1 . 0 , 0 . 0 , 1 . 0 ,
1.0 ,1.0 ,0.0
The line breaks are only for readability; the neural network treats XOR as a
long sequence. By using the data above, the network has a single input neuron
and a single output neuron. The input neuron is fed one value from the list
above and the output neuron is expected to return the next value.
This shows that there are often multiple ways to model the data for a
neural network. How the data is modeled will greatly influence the success of
a neural network. If one particular model is not working, another should be
considered. The next step is to format the XOR data for a feedforward neural
network.
Because the XOR operator has two inputs and one output, the neural
network follows suit. Additionally, the neural network has a single hidden
layer with two neurons to help process the data. The choice for two neurons
in the hidden layer is arbitrary and often results in trial and error. The XOR
problem is simple and two hidden neurons are sufficient to solve it. A diagram
for this network is shown in Figure 1.1.
1.5 Using a Neural Network 11
Figure 1.1: Neuron Diagram for the XOR Network
There are four different types of neurons in the above network. These are
summarized below:
Input Neurons: I1, I2
Output Neuron: O1
Hidden Neurons: H1, H2
Bias Neurons: B1, B2
The input, output and hidden neurons were discussed previously. The new
neuron type seen in this diagram is the bias neuron. A bias neuron always
outputs a value of 1 and never receives input from the previous layer.
In a nutshell, bias neurons allow the neural network to learn patterns more
effectively. They serve a similar function to the hidden neurons. Without
bias neurons, it is very hard for the neural network to output a value of one
when the input is zero. This is not so much a problem for XOR data, but it
can be for other data sets. To read more about their exact function, visit the
following URL:
http://www.heatonresearch.com/wiki/Bias
Now look at the code used to produce a neural network that solves the
XOR operator. The complete code is included with the Encog examples and
can be found at the following location.
o r g . en c og . e x am p l es . n e u r a l . xo r . XORH ell oWor ld
12 Regression, Classification & Clustering
The example begins by creating the neural network seen in Figure 1.1. The
code needed to create this network is relatively simple:
Basi c N e twork networ k = new B as ic N et w or k ( ) ;
net wo rk . add La yer (new BasicLayer( null ,t rue , 2 ) ) ;
net wo rk . add La yer (new BasicLayer(new ActivationSigmoid ( ) , true , 3 ) ) ;
net wo rk . add La yer (new BasicLayer(new ActivationSigmoid ( ) , false , 1 ) ) ;
net wo rk . ge t S t ru ct ur e ( ) . f i n a l i z e S t r u c t u r e ( ) ;
n et wo rk . r e s e t ( ) ;
In the above code, a BasicNetwork is being created. Three layers are added
to this network. The first layer, which becomes the input layer, has two neu-
rons. The hidden layer is added second and has two neurons also. Lastly, the
output layer is added and has a single neuron. Finally, the finalizeStructure
method is called to inform the network that no more layers are to be added.
The call to reset randomizes the weights in the connections between these
layers.
Neural networks always begin with random weight values. A process called
training refines these weights to values that will provide the desired output.
Because neural networks always start with random values, very different results
occur from two runs of the same program. Some random weights provide
a better starting point than others. Sometimes random weights will be far
enough off that the network will fail to learn. In this case, the weights should
be randomized again and the process restarted.
You will also notice the ActivationSigmoid class in the above code. This
specifies the neural network to use the sigmoid activation function. Activation
functions will be covered in Chapter 4. The activation functions are only placed
on the hidden and output layer; the input layer does not have an activation
function. If an activation function were specified for the input layer, it would
have no effect.
Each layer also specifies a boolean value. This boolean value specifies
if bias neurons are present on a layer or not. The output layer, as shown in
Figure 1.1, does not have a bias neuron as input and hidden layers do. This is
because a bias neuron is only connected to the next layer. The output layer
is the final layer, so there is no need for a bias neuron. If a bias neuron was
specified on the output layer, it would have no effect.
1.5 Using a Neural Network 13
These weights make up the long-term memory of the neural network. Some
neural networks also contain context layers which give the neural network a
short-term memory as well. The neural network learns by modifying these
weight values. This is also true of the Elman and Jordan neural networks.
Now that the neural network has been created, it must be trained. Training
is the process where the random weights are refined to produce output closer
to the desired output. Training is discussed in the next section.
1.5.3 Training a Neural Network
To train the neural network, a MLDataSet object is constructed. This object
contains the inputs and the expected outputs. To construct this object, two
arrays are created. The first array will hold the input values for the XOR
operator. The second array will hold the ideal outputs for each of four corre-
sponding input values. These will correspond to the possible values for XOR.
To review, the four possible values are as follows:
0 XOR 0 = 0
1 XOR 0 = 1
0 XOR 1 = 1
1 XOR 1 = 0
First, construct an array to hold the four input values to the XOR operator
using a two dimensional double array. This array is as follows:
p ub li c s t a t i c do ubl e XOR INPUT [ ] [ ] = {
{0 . 0 , 0 . 0 },
{1 . 0 , 0 . 0 },
{0 . 0 , 1 . 0 },
{1 . 0 , 1 . 0 } } ;
Likewise, an array must be created for the expected outputs for each of the
input values. This array is as follows:
p ub li c s t a t i c do ubl e XOR IDEAL [ ] [ ] = {
{0 . 0 },
{1 . 0 },
{1 . 0 },
{0 . 0 } } ;
14 Regression, Classification & Clustering
Even though there is only one output value, a two-dimensional array must
still be used to represent the output. If there is more than one output neuron,
additional columns are added to the array.
Now that the two input arrays are constructed, a MLDataSet object must
be created to hold the training set. This object is created as follows:
MLD ataSe t t r a i n i n g S e t = new B asic MLD at aSe t (XOR INPUT , XOR IDEAL) ;
Now that the training set has been created, the neural network can be trained.
Training is the process where the neural network’s weights are adjusted to
better produce the expected output. Training will continue for many iterations
until the error rate of the network is below an acceptable level. First, a training
object must be created. Encog supports many different types of training.
For this example Resilient Propagation (RPROP) training is used. RPROP
is perhaps the best general-purpose training algorithm supported by Encog.
Other training techniques are provided as well as certain problems are solved
better with certain training techniques. The following code constructs a
RPROP trainer:
MLT rain t r a i n = new R e s i l i e n t P r o p a g a t i o n ( n et wo r k , t r a i n i n g S e t ) ;
All training classes implement the MLTrain interface. The RPROP algorithm
is implemented by the ResilientPropagation class, which is constructed
above.
Once the trainer is constructed, the neural network should be trained.
Training the neural network involves calling the iteration method on the
MLTrain class until the error is below a specific value. The error is the
degree to which the neural network output matches the desired output.
int e po ch = 1 ;
do {
t r a in . i t e r a t i o n () ;
Sy st em . ou t . p r i n t l n ( ” Epoc h # + ep o ch + E rr o r :
+ t r a i n . g et E r r o r ( ) ) ;
epo c h++;
}while( t r a i n . g e tE r r o r ( ) >0.01) ;
The above code loops through as many iterations, or epochs, as it takes to get
the error rate for the neural network to be below 1%. Once the neural network
1.5 Using a Neural Network 15
has been trained, it is ready for use. The next section will explain how to use
a neural network.
1.5.4 Executing a Neural Network
Making use of the neural network involves calling the compute method on
the BasicNetwork class. Here we loop through every training set value and
display the output from the neural network:
Sy st em . o ut . p r i n t l n ( ” N e u ra l N et wo rk R e s u l t s : ” ) ;
for( ML Data Pai r p a i r : t r a i n i n g S e t ) {
final MLData output =
n et wo r k . co mpu te ( p a i r . g et I n p u t ( ) ) ;
Sy st em . ou t . p r i n t l n ( p a i r . g e t In p u t ( ) . ge tD a ta ( 0 )
+ , ” + p a i r . g e t I n p ut ( ) . g e tD a ta ( 1 )
+ , a c t u a l=” + o u t pu t . ge t Da t a ( 0 ) + , i d e a l =” +
p a i r . g e t I d e a l ( ) . g e tD at a ( 0 ) ) ;
}
The compute method accepts an MLData class and also returns another
MLData object. The returned object contains the output from the neural
network, which is displayed to the user. With the program run, the training
results are first displayed. For each epoch, the current error rate is displayed.
Epoch #1 E rr o r :0 .5 60 4437 5 12 29 52 36
Epoch #2 E rr o r :0 .5 05 6375 1 55 78 43 16
Epoch #3 E rr o r :0 .5 02 6960 7 20 52 61 66
Epoch #4 E rr o r :0 .4 90 7299 4 98 39 05 94
. . .
Epoch #104 E rr or : 0. 01 01 72 7834 5 76 64 72
Epoch #105 E rr or : 0. 01 05 57 20 2078 6 9 77 51
Epoch #106 E rr or : 0. 01 10 34 96 5164 6 7 28 06
Epoch #107 E rr or : 0. 00 96 82 10 2808 6 1 63 87
The error starts at 56% at epoch 1. By epoch 107, the training dropped below
1% and training stops. Because neural network was initialized with random
weights, it may take different numbers of iterations to train each time the
program is run. Additionally, though the final error rate may be different, it
should always end below 1%.
16 Regression, Classification & Clustering
Finally, the program displays the results from each of the training items as
follows:
N eu r a l N etw or k R e s u l t s :
0 . 0 , 0 . 0 , a c t u a l = 0 . 00 2 7 82 5 3 88 1 8 03 4 0 49 , i d e a l =0 .0
1 . 0 , 0 . 0 , a c t u a l = 0 . 99 0 3 74 1 9 37 1 2 11 7 7 , i d e a l =1 .0
0 . 0 , 1 . 0 , a c t u a l = 0 . 98 3 6 80 7 9 56 5 6 61 8 7 , i d e a l =1 .0
1 . 0 , 1 . 0 , a c t u a l = 0 . 00 1 1 64 6 0 72 5 8 61 7 2 77 8 , i d e a l = 0. 0
As you can see, the network has not been trained to give the exact results.
This is normal. Because the network was trained to 1% error, each of the
results will also be within generally 1% of the expected value.
Because the neural network is initialized to random values, the final output
will be different on second run of the program.
N eu r a l N etw or k R e s u l t s :
0 . 0 , 0 . 0 , a c t u a l = 0 . 00 5 4 89 8 2 22 1 4 92 6 6 85 , i d e a l =0 .0
1 . 0 , 0 . 0 , a c t u a l = 0 . 98 5 4 25 0 9 08 6 0 28 7 , i d e a l = 1. 0
0 . 0 , 1 . 0 , a c t u a l = 0 . 98 8 8 06 4 7 42 9 9 44 6 3 , i d e a l =1 .0
1 . 0 , 1 . 0 , a c t u a l = 0 . 00 5 9 23 1 4 63 6 9 55 7 0 53 , i d e a l =0 .0
The second run output is slightly different. This is normal.
This is the first Encog example. All of the examples contained in this
book are also included with the examples downloaded with Encog. For more
information on how to download these examples and where this particular
example is located, refer to Appendix A, “Installing Encog.
1.6 Chapter Summary
Encog is an advanced machine learning framework used to create neural net-
works. This chapter focused on regression in classification and clustering.
Finally, this chapter showed how to create an Encog application that could
learn the XOR operator.
Regression is when a neural network accepts input and produces a numeric
output. Classification is where a neural network accepts input and predicts
what class the input was in. Clustering does not require ideal outputs. Rather,
clustering looks at the input data and clusters the input cases as best it can.
1.6 Chapter Summary 17
There are several different layer types supported by Encog. However, these
layers fall into three groups depending their placement in the neural network.
The input layer accepts input from the outside. Hidden layers accept data from
the input layer for further processing. The output layer takes data, either from
the input or final hidden layer, and presents it on to the outside world.
The XOR operator was used as an example for this chapter. The XOR
operator is frequently used as a simple “Hello World” application for neural
networks. The XOR operator provides a very simple pattern that most neural
networks can easily learn. It is important to know how to structure data for a
neural network. Neural networks both accept and return an array of floating
point numbers.
Finally, this chapter detailed how to send data to a neural network. Data
for the XOR example is easily provided to a neural network. No normaliza-
tion or encoding is necessary. However, most real world data will need to be
normalized. Normalization is demonstrated in the next chapter.
19
Chapter 2
Obtaining Data for Encog
Finding Data for Neural Networks
Why Normalize?
Specifying Normalization Sources
Specifying Normalization Targets
Neural networks can provide profound insights into the data supplied to them.
However, you can’t just feed any sort of data directly into a neural network.
This “raw” data must usually be normalized into a form that the neural net-
work can process. This chapter will show how to normalize “raw” data for use
by Encog.
Before data can be normalized, we must first have data. Once you decide
what the neural network should do, you must find data to teach the neural
network how to perform a task. Fortunately, the Internet provides a wealth of
information that can be used with neural networks.
2.1 Where to Get Data for Neural Networks
The Internet can be a great source of data for the neural network. Data found
on the Internet can be in many different formats. One of the most convenient
20 Obtaining Data for Encog
formats for data is the comma-separated value (CSV) format. Other times it
may be necessary to create a spider or bot to obtain this data.
One very useful source for neural network is the Machine Learning Repos-
itory, which is run by the University of California at Irvine.
http://kdd.ics.uci.edu/
The Machine Learning Repository site is a repository of various datasets
that have been donated to the University of California. Several of these
datasets will be used in this book.
2.2 Normalizing Data
Data obtained from sites, such as those listed above, often cannot be directly
fed into neural networks. Neural networks can be very “intelligent,” but cannot
receive just any sort of data and produce a meaningful result. Often the data
must first be normalized. We will begin by defining normalization.
Neural networks are designed to accept floating-point numbers as their
input. Usually these input numbers should be in either the range of -1 to
+1 or 0 to +1 for maximum efficiency. The choice of which range is often
dictated by the choice of activation function, as certain activation functions
have a positive range and others have both a negative and positive range.
The sigmoid activation function, for example, has a range of only positive
numbers. Conversely, the hyperbolic tangent activation function has a range of
positive and negative numbers. The most common case is to use a hyperbolic
tangent activation function with a normalization range of -1 to +1.
Recall from Chapter 1 the iris dataset. This data set could be applied to
a classification problem. However, we did not see how the data needed to be
actually processed to make it useful to a neural network.
A sampling of the dataset is shown here:
” S ep a l Len gt h , ” S e pa l Width ” , ” P e ta l Le ng th ” , ” Pe t al Width ” , ” S p e ci es
5 . 1 , 3 . 5 , 1 . 4 , 0 . 2 , ” s e t o s a ”
4 . 9 , 3 . 0 , 1 . 4 , 0 . 2 , ” s e t o s a ”
4 . 7 , 3 . 2 , 1 . 3 , 0 . 2 , ” s e t o s a ”
. . .
2.2 Normalizing Data 21
7 . 0 , 3 . 2 , 4 . 7 , 1 . 4 , ” v e r s i c o l o r
6 . 4 , 3 . 2 , 4 . 5 , 1 . 5 , ” v e r s i c o l o r
6 . 9 , 3 . 1 , 4 . 9 , 1 . 5 , ” v e r s i c o l o r
. . .
6 . 3 , 3 . 3 , 6 . 0 , 2 . 5 , ” v i r g i n i c a
5 . 8 , 2 . 7 , 5 . 1 , 1 . 9 , ” v i r g i n i c a
7 . 1 , 3 . 0 , 5 . 9 , 2 . 1 , ” v i r g i n i c a
The fields from this dataset must now be represented as an array of floating
point numbers between -1 and +1.
Sepal Length - Numeric
Sepal Width - Numeric
Petal Length - Numeric
Petal Width - Numeric
Species - Class
There are really two different attribute types to consider. First, there are four
numeric attributes. Each of these will simply map to an input neuron. The
values will need to be scaled to -1 to +1.
Class attributes, sometimes called nominal attributes, present a unique
challenge. In the example, the species of iris must be represented as either
one or more floating point numbers. The mapping will not be to a single
neuron. Because a three-member class is involved, the number of neurons
used to represent the species will not be a single neuron. The number of
neurons used to represent the species will be either two or three, depending
on the normalization type used.
The next two sections will show how to normalize numeric and class values,
beginning with numeric values.
2.2.1 Normalizing Numeric Values
Normalizing a numeric value is essentially a process of mapping the existing
numeric value to well-defined numeric range, such as -1 to +1. Normalization
22 Obtaining Data for Encog
causes all of the attributes to be in the same range with no one attribute more
powerful than the others.
To normalize, the current numeric ranges must be known for all of the
attributes. The current numeric ranges for each of the iris attributes are
shown here.
Sepal Length - Max: 7.9, Min: 4.3
Sepal Width - Max: 4.4, Min: 2.0
Petal Length - Max: 6.9, Min: 1.0
Petal Width - Max: 2.5, Min: 0.1
Consider the “Petal Length. The petal length is in the range of 1.0 to 6.9.
This must convert this length to -1 to +1. To do this we use Equation 2.1.
f(x) = (xdL)(nHnL)
(dHdL)+nL(2.1)
The above equation will normalize a value x, where the variable drepresents
the high and low values of the data, the variable nrepresents the high and
low normalization range desired. For example, to normalize a petal length of
3, to the range -1 to +1, the above equation becomes:
f(x) = (31.0)(1.0(1.0))
(6.91.0) + (1.0) (2.2)
This results in a value of 0.66. This is the value that will be fed to the neural
network.
For regression, the neural network will return values. These values will be
normalized. To denormalize a value, Equation 2.2 is used.
f(x) = (dLdH)x(nH·dL)+dH·nL
(nLnH)(2.3)
To denormalize the value of 0.66, Equation 2.2 becomes:
f(x) = (1.06.9)·0.32(1.0·1.0)+6.9·−1
((1)(1.0)) (2.4)
2.2 Normalizing Data 23
Once denormalized, the value of 0.66 becomes 2.0 again. It is important to
note that the 0.66 value was rounded for the calculation here. Encog provides
built-in classes to provide both normalization and denormalization. These
classes will be introduced later in this chapter.
2.2.2 Normalizing Nominal Values
Nominal values are used to name things. One very common example of a
simple nominal value is gender. Something is either male or female. Another
is any sort of Boolean question. Nominal values also include values that are
either “yes/true” or “no/false. However, not all nominal values have only two
values.
Nominal values can also be used to describe an attribute of something,
such as color. Neural networks deal best with nominal values where the set is
fixed. For the iris dataset, the nominal value to be normalized is the species.
There are three different species to consider for the iris dataset and this value
cannot change. If the neural network is trained with three species, it cannot
be expected to recognize five species.
Encog supports two different ways to encode nominal values. The simplest
means of representing nominal values is called “one-of-n” encoding. One-of-n
encoding can often be hard to train, especially if there are more than a few
nominal types to encode. Equilateral encoding is usually a better choice than
the simpler one-of-n encoding. Both encoding types will be explored in the
next two sections.
24 Obtaining Data for Encog
2.2.3 Understanding One-of-n Normalization
One-of-n is a very simple form of normalization. For an example, consider
the iris dataset again. The input to the neural network is statistics about an
individual iris. The output signifies which species of iris to evaluate. The three
iris species are listed as follows:
Setosa
Versicolor
Virginica
If using the one-of-n normalization, the neural network would have three out-
put neurons. Each of these three neurons would represent one iris species. The
iris species predicted by the neural network would correspond to the output
neuron with the highest activation.
Generating training data for one-of-n is relatively easy. Simply assign a +1
to the neuron that corresponds to the chosen iris and a -1 to the remaining
neurons. For example, the Setosa iris species would be encoded as follows:
1,1,1
Likewise, the Versicolor would be encoded as follows:
1,1 , 1
Finally, Virginica would be encoded as follows.
1,1,1
Encog provides built-in classes to provide this normalization. These classes
will be introduced later in this chapter.
2.2 Normalizing Data 25
2.2.4 Understanding Equilateral Normalization
The output neurons are constantly checked against the ideal output values
provided in the training set. The error between the actual output and the
ideal output is represented by a percentage. This can cause a problem for the
one-of-n normalization method. Consider if the neural network had predicted
a Versicolor iris when it should have predicted a Verginica iris. The actual
output and ideal would be as follows:
I d e a l Ou tpu t : 1, 1, 1
Actual Output : 1, 1 , 1
The problem is that only two of three output neurons are incorrect. We would
like to spread the “guilt” for this error over a larger percent of the neurons.
To do this, a unique set of values for each class must be determined. Each set
of values should have an equal Euclidean distance from the others. The equal
distance makes sure that incorrectly choosing iris Setosa for Versicolor has the
same error weight as choosing iris Setosa for iris Virginica.
This can be done using the Equilateral class. The following code segment
shows how to use the Equilateral class to generate these values:
Equilateral eq = new Equilateral (3,1 , 1) ;
for(int i = 0; i <3; i ++) {
StringBuilder line = new StringBuilder () ;
l i n e . a ppe nd ( i ) ;
l i n e . a ppe nd ( : ’ ) ;
double [ ] d = eq . en c o d e ( i ) ;
for(int j = 0; j <d . l e n g t h ; j ++)
{
i f ( j >0 )
l i n e . a ppe nd ( , ’ ) ;
l i n e . a ppe nd ( Fo rmat . f or ma tD oub le ( d [ j ] , 4 ) ) ;
}
Sy st em . o u t . p r i n t l n ( l i n e . t o S t r i n g ( ) ) ;
}
The inputs to the Equilateral class are the number of classes and the nor-
malized range. In the above code, there are three classes that are normalized
to the range -1 to 1, producing the following output:
26 Obtaining Data for Encog
Listing 2.1: Calculated Class Equilateral Values 3 Classes
0 : 0 . 8 6 60 , 0 .5 0 0 0
1: 0.8660 , 0 .5 0 0 0
2 : 0 . 0 0 00 , 1.0 000
Notice that there are two outputs for each of the three classes. This decreases
the number of neurons needed by one from the amount needed for one-of-n
encoding. Equilateral encoding always requires one fewer output neuron than
one-of-n encoding would have. Equilateral encoding is never used for fewer
than three classes.
Look at the example before with equilateral normalization. Just as before,
consider if the neural network had predicted a Versicolor iris, when it should
have predicted a Verginica iris. The output and ideal are as follows:
I d e a l Ou tpu t : 0 . 00 0 0 , 1 .000 0
Actual Output : 0. 8660 , 0 . 5 00 0
In this case there are only two neurons, as is consistent with equilateral en-
coding. Now all neurons are producing incorrect values. Additionally, there
are only two output neurons to process, slightly decreasing the complexity of
the neural network.
Neural networks will rarely give output that exactly matches any of its
training values. To deal with this in “one-of-n” encoding, look at which out-
put neuron produced the highest output. This method does not work for
equilateral encoding. Equilateral encoding shows which calculated class equi-
lateral value (Listing 2.1) has the shortest distance to the actual output of the
neural network.
What is meant by each of the sets being equal in distance from each other?
It means that their Euclidean distance is equal. The Euclidean distance can
be calculated using Equation 2.3.
d(p,q) = v
u
u
t
n
X
i=1
(piqi)2(2.5)
In the above equation the variable “q” represents the ideal output value; the
variable “p” represents the actual output value. There are “n” sets of ideal and
actual. Euclidean normalization is implemented using the Equilateral class in
2.3 Programmatic Normalization 27
Encog. Usually it is unnecessary to directly deal with the Equilateral class in
Encog. Rather, one of the higher-level normalization methods described later
in this chapter is used.
If you are interested in the precise means by which the equilateral numbers
are calculated, visit the following URL:
http://www.heatonresearch.com/wiki/Equilateral
2.3 Programmatic Normalization
Encog provides a number of different means of normalizing data. The exact
means that you use will be determined by exactly what you are trying to
accomplish. The three methods for normalization are summarized here.
Normalizing Individual Numbers
Normalizing CSV Files
Normalizing Memory Arrays
The next three sections will look at all three, beginning with normalizing
individual numbers.
2.3.1 Normalizing Individual Numbers
Very often you will simply want to normalize or denormalize a single number.
The range of values in your data is already known. For this case, it is un-
necessary to go through the overhead of having Encog automatically discover
ranges for you.
The “Lunar Lander” program is a good example of this. You can find the
“Lunar Lander” example here.
o r g . en c og . e x am p l es . n e u r a l . l u n ar . L u na rL a nd er
To perform the normalization, several NormalizedField objects are created.
Here you see the NormalizedField object that was created for the lunar
lander’s fuel.
28 Obtaining Data for Encog
N or m a li z ed F i el d f u e l S t a t s =
new N or ma lize d F ie ld (
No r m a li z a ti o nA c ti o n . Normal i z e ,
” f u e l ” ,
200 ,
0 ,
0.9 ,
0 . 9 ) ;
For the above example the range is normalized to -0.9 to 0.9. This is very
similar to normalizing between -1 and 1, but less extreme. This can produce
better results at times. It is also known that the acceptable range for fuel is
between 0 and 200.
Now that the field object has been created, it is easy to normalize the
values. Here the value 100 is normalized into the variable n.
double n = this . f u e l S t a t s . n o r ma l i ze ( 10 0 ) ;
To denormalize nback to the original fuel value, use the following code:
double f = this . f u e l S t a t s . de n or ma l iz e ( n ) ;
Using the NormalizedField classes directly is useful when numbers arrive as
the program runs. If large lists of numbers are already established, such as an
array or CSV file, this method will not be as effective.
2.3.2 Normalizing Memory Arrays
To quickly normalize an array, the NormalizeArray class can be useful.
This object works by normalizing one attribute at a time. An example of
the normalize array class working is shown in the sunspot prediction example.
This example can be found here:
o r g . en c og . e x am p l es . n e u r a l . p r e d i c t . s u n s p o t . P r e di c t S u n s p o t
To begin, create an instance of the NormalizeArray object. Set the high
and low range for normalization.
Normaliz e A r r a y norm = new No r ma l iz e Ar r ay ( ) ;
2.4 Normalizing CSV Files 29
norm . setNo r m a l i z edHigh ( 1 ) ;
norm. setNormalizedLow ( 1 ) ;
Now raw data array can be normalized into a normalized array.
double [ ] n o r ma l i z e dS u n s p ot s = no rm . p r o c e s s ( r aw Da ta Ar ra y ) ;
If you have an entire array to normalize to the same high/low, the Nor-
malizeArray class works well. For more fine-tuned control, use the same
techniques described in the previous section for individual values. However,
all array elements must be looped over.
2.4 Normalizing CSV Files
If the data to be normalized is already stored in CSV files, Encog Analyst
should be used to normalize the data. Encog Analyst can be used both through
the Encog Workbench and directly from Java and C#. This section explains
how to use it through Java to normalize the Iris data set.
To normalize a file, look at the file normalization example found at the
following location:
o r g . en c og . e x am p l es . n e u r a l . n o rm a l i z e . N o r m a l i z e F i l e
This example takes an input and output file. The input file is the iris data
set. The first lines of this file are shown here:
” s e p a l l ” , ” s e pa l w ” , ” p e t a l l ” , ” p et a l w ” , ” s p e c i es ”
5 . 1 , 3 . 5 , 1 . 4 , 0 . 2 , I r i s s e t o s a
4 . 9 , 3 . 0 , 1 . 4 , 0 . 2 , I r i s s e t o s a
4 . 7 , 3 . 2 , 1 . 3 , 0 . 2 , I r i s s e t o s a
4 . 6 , 3 . 1 , 1 . 5 , 0 . 2 , I r i s s e t o s a
5 . 0 , 3 . 6 , 1 . 4 , 0 . 2 , I r i s s e t o s a
5 . 4 , 3 . 9 , 1 . 7 , 0 . 4 , I r i s s e t o s a
4 . 6 , 3 . 4 , 1 . 4 , 0 . 3 , I r i s s e t o s a
5 . 0 , 3 . 4 , 1 . 5 , 0 . 2 , I r i s s e t o s a
The output will be a normalized version of the input file, as shown below:
s e p a l l ” , ” s e p al w , ” p e t a l l ” , p e t al w ” , ” s p e c i e s ( p0 ) ” , ” s p e c i e s ( p1 )
30 Obtaining Data for Encog
0.55,0.24,0.86,0.91,0.86,0.5
0.66 , 0 .1 6 , 0. 86 , 0 .9 1 , 0. 86 , 0.5
0.77 ,0 , 0. 89 , 0 .9 1 , 0. 86 , 0 . 5
0.83 , 0 .0 8 , 0. 83 , 0 .9 1 , 0. 86 , 0.5
0.61,0.33,0.86,0.91,0.86,0.5
0.38,0.58,0.76,0.75,0.86,0.5
0.83,0.16,0.86,0.83,0.86,0.5
0.61,0.16,0.83,0.91,0.86,0.5
The above data shows that the numeric values have all been normalized to
between -1 and 1. Additionally, the species field is broken out into two parts.
This is because equilateral normalization was used on the species column.
2.4.1 Implementing Basic File Normalization
In the last section, you saw how Encog Analyst normalizes a file. In this
section, you will learn the programming code necessary to accomplish this.
Begin by accessing the source and target files:
File sourceFile = new Fi le ( ar g s [ 0 ] ) ;
F i l e t a r g e t F i l e = new Fi le ( ar gs [ 1 ] ) ;
Now create instances of EncogAnalyst and AnalystWizard. The wizard
will analyze the source file and build all of the normalization stats needed to
perform the normalization.
E nc o gA n al ys t a n a l y s t = new E nc o gA n al ys t ( ) ;
Ana l y stWizar d wizard = new A n al y st W iz a rd ( a n a l y s t ) ;
The wizard can now be started.
w iz a r d . wi z a rd ( s o u r c e F i l e , true , A n a l y stFileF o r m a t .DECPNT COMMA) ;
Now that the input file has been analyzed, it is time to create a normalization
object. This object will perform the actual normalization.
final AnalystNormalizeCSV norm = new AnalystNormalizeCSV () ;
norm . a n a l yz e ( s o u r c e F i l e , true , CS VFormat . ENGLISH , a n a l y s t ) ;
It is necessary to specify the output format for the CSV, in this case, use
ENGLISH, which specifies a decimal point. It is also important to produce
output headers to easily identify all attributes.
2.4 Normalizing CSV Files 31
norm . se tO utpu tF ormat ( CSVFormat .ENGLISH) ;
norm. setProduceOutputHeaders (t rue ) ;
Finally, we normalize the file.
norm . n o rm a li ze ( t a r g e t F i l e ) ;
Now that the data is normalized, the normalization stats may be saved for
later use. This is covered in the next section.
2.4.2 Saving the Normalization Script
Encog keeps statistics on normalized data. This data, called the normalization
stats, tells Encog the numeric ranges for each attribute that was normalized.
This data can be saved so that it does not need to be renormalized each time.
To save a stats file, use the following command:
a n a l y s t . s a v e ( new F i l e ( s t a t s . eg a ” ) ) ;
The file can be later reloaded with the following command:
a n a l y s t . l o a d ( new F i l e ( ” s t a t s . e g a ” ) ) ;
The extension EGA is common and stands for “Encog Analyst.
2.4.3 Customizing File Normalization
The Encog Analyst contains a collection of AnalystField objects. These ob-
jects hold the type of normalization and the ranges of each attribute. This
collection can be directly accessed to change how the attributes are normal-
ized. Also, AnalystField objects can be removed and excludes from the final
output.
The following code shows how to access each of the fields determined by
the wizard.
Sy st em . ou t . p r i n t l n ( ” F i e l d s f ou nd i n f i l e : ” ) ;
for ( A n a l y s t F i e ld f i e l d : a n a l y s t . g e t S c r i p t ( ) . g e t N o rm a l i z e ( )
32 Obtaining Data for Encog
. ge t N o r m a l i z e d F i e l d s ( ) ) {
StringBuilder line = new StringBuilder () ;
l i n e . a ppe nd ( f i e l d . getN ame ( ) ) ;
l i n e . a ppe nd ( ” , a c t io n=” ) ;
l i n e . a ppe nd ( f i e l d . ge t Ac t io n () ) ;
l i n e . a ppe nd ( ” , min=” ) ;
l i n e . a ppe nd ( f i e l d . g et Ac tu al Lo w ( ) ) ;
l i n e . a ppe nd ( ” ,max=” ) ;
l i n e . a ppe nd ( f i e l d . ge tAct ua lH i gh ( ) ) ;
Sy st em . o u t . p r i n t l n ( l i n e . t o S t r i n g ( ) ) ;
}
There are several important attributes on each of the AnalystField objects.
For example, to change the normalization range to 0 to 1, execute the following
commands:
f i e l d . se tNorma li ze dHigh ( 1 ) ;
f i e l d . se tNormali ze dL ow (0 ) ;
The mode of normalization can also be changed. To use one-of-n normalization
instead of equilateral, just use the following command:
f i e l d . s e t A c t i o n ( No r m a l i z a t i o nA c t i o n . One Of ) ;
Encog Analyst can do much more than just normalize data. It is also performs
the entire normalization, training and evaluation of a neural network. This
will be covered in greater detail in Chapters 3 and 4. Chapter 3 will show how
to do this from the workbench, while Chapter 4 will show how to do this from
code.
2.5 Summary
This chapter explained how to obtain and normalize data for Encog. There are
many different sources of data. One of the best is the UCI Machine Learning
Repository, which provides many of the dataset examples in this book.
There are two broad classes of data to normalize: numeric and non-numeric
data. These two data classes each have techniques for normalization.
Numeric data is normalized by mapping values to a specific range, often
from -1 to +1. Another common range is between 0 and +1. Formulas were
2.5 Summary 33
provided earlier in this chapter for both normalization and denormalization.
Non-numeric data is usually an attribute that defines a class. For the case
of the iris dataset, the iris species is a non-numeric class. To normalize these
classes, they must be converted to an array of floating point values, just as
with numeric data.
Encog supports two types of nominal normalization. The first is called
“one-of-n. One-of-n creates a number of neurons equal to the number of class
items. The class number to be encoded is given a value of 1. Others are given
zeros.
Equilateral encoding is another way to encode a class. For equilateral
encoding, a number of neurons is used that equals one, less the number of
class items. A code of floating point numbers is created for each class item
with uniform equilateral distance to the other class data items. This allows all
output neurons to play a part in each class item and causes an error to affect
more neurons than one-of-n encoding.
This chapter introduced the Encog Analyst and explained its use to nor-
malize data. The Encog Analyst can also be used in the Encog Workbench.
The Encog Workbench is a GUI application that allows many of the features
of neural networks to be accessed without the need to write code.
35
Chapter 3
The Encog Workbench
Structure of the Encog Workbench
A Simple XOR Example
Using the Encog Analyst
Encog Analyst Reports
The Encog Workbench is a GUI application that enables many different ma-
chine learning tasks without writing Java or C# code. The Encog Workbench
itself is written in Java, but generates files that can be used with any Encog
framework.
The Encog Workbench is distributed as a single self-executing JAR file.
On most operating systems, the Encog Workbench JAR file is started simply
by double-clicking. This includes Microsoft Windows, Macintosh and some
variants of Linux. To start from the command line, the following command is
used.
ja v a j a r . / en co gworkbench 3 .0 .0 executable
Depending on the version of Encog, the above JAR file might have a differ-
ent name. No matter the version, the file will have “encog-workbench” and
“executable” somewhere in its name. No other JAR files are necessary for the
workbench as all third-party JAR files were are placed inside this JAR.
36 The Encog Workbench
3.1 Structure of the Encog Workbench
Before studying how the Encog Workbench is actually used, we will learn
about its structure. The workbench works with a project directory that holds
all of the files needed for a project. The Encog Workbench project contains
no subdirectories. Also, if a subdirectory is added into an Encog Workbench
project, it simply becomes another independent project.
There is also no main “project file” inside an Encog Workbench project.
Often a readme.txt or readme.html file is placed inside of an Encog Workbench
project to explain what to do with the project. However, this file is included
at the discretion of the project creator.
There are several different file types that might be placed in an Encog
workbench project. These files are organized by their file extension. The
extension of a file is how the Encog Workbench knows what to do with that
file. The following extensions are recognized by the Encog Workbench:
.csv
.eg
.ega
.egb
.gif
.html
.jpg
.png
.txt
The following sections will discuss the purpose of each file type.
3.1 Structure of the Encog Workbench 37
3.1.1 Workbench CSV Files
An acronym for “comma separated values,” CSV files hold tabular data. How-
ever, CSV files are not always “comma separated. This is especially true in
parts of the world that use a decimal comma instead of a decimal point. The
CSV files used by Encog can be based on a decimal comma. In this case, a
semicolon (;) should be used as the field separator.
CSV files may also have headers to define what each column of the CSV
file means. Column headers are optional, but very much suggested. Column
headers name the attributes and provide consistency across the both the CSV
files created by Encog and provided by the user.
A CSV file defines the data used by Encog. Each row in the CSV file defines
a training set element and each column defines an attribute. If a particular
attribute is not known for a training set element, then the “?” character
should be placed in that row/column. Encog deals with missing values in
various ways. This is discussed later in this chapter in the Encog analyst
discussion.
A CSV file cannot be used to directly train a neural network, but must
first be converted into an EGB file. To convert a CSV file to an EGB file,
right-click the CSV file and choose “Export to Training (EGB).” EGB files
nicely define what columns are input and ideal data, while CSV files do not
offer any distinction. Rather, CSV files might represent raw data provided by
the user. Additionally, some CSV files are generated by Encog as raw user
data is processed.
3.1.2 Workbench EG Files
Encog EG files store a variety of different object types, but in themselves are
simply text files. All data inside of EG files is stored with decimal points and
comma separator, regardless of the geographic region in which Encog is run-
ning. While CSV files can be formatted according to local number formatting
rules, EG files cannot. This is to keep EG files consistent across all Encog
platforms.
38 The Encog Workbench
The following object types are stored in EG files.
Machine Learning Methods (i.e. Neural Networks)
NEAT Populations
Training Continuation Data
The Encog workbench will display the object type of any EG file that is located
in the project directory. An Encog EG file only stores one object per file. If
multiple objects are to be stored, they must be stored in separate EG files.
3.1.3 Workbench EGA Files
Encog Analyst script files, or EGA files, hold instructions for the Encog ana-
lyst. These files hold statistical information about what a CSV file is designed
to analyze. EGA files also hold script information that describes how to pro-
cess raw data. EGA files are executable by the workbench.
A full discussion of the EGA file and every possible configuration/script
item is beyond the scope of this book. However, a future book will be dedi-
cated to the Encog Analyst. Additional reference information about the Encog
Analyst script file can be found here:
http://www.heatonresearch.com/wiki/EGA_File
Later in this chapter, we will create an EGA file to analyze the iris dataset.
3.1.4 Workbench EGB Files
Encog binary files, or EGB files, hold training data. As previously discussed,
CSV files are typically converted to EGB for Encog. This data is stored in
a platform-independent binary format. Because of this, EGB files are read
much faster than a CSV file. Additionally, the EGB file internally contains
the number of input and ideal columns present in the file. CSV files must be
converted to EGB files prior to training. To convert a CSV file to an EGB file,
right-click the selected CSV file and choose “Export to Training (EGB).
3.2 A Simple XOR Example 39
3.1.5 Workbench Image Files
The Encog workbench does not directly work with image files at this point,
but can be displayed by double-clicking. The Encog workbench is capable of
displaying PNG, JPG and GIF files.
3.1.6 Workbench Text Files
Encog Workbench does not directly use text files. However, text files are a
means of storing instructions for project file users. For instance, a readme.txt
file can be added to a project and displayed inside of the analyst. The Encog
Workbench can display both text and HTML files.
3.2 A Simple XOR Example
There are many different ways that the Encog Workbench can be used. The
Encog Analyst can be used to create projects that include normalization, train-
ing and analysis. However, all of the individual neural network parts can also
manually created and trained. If the data is already normalized, Encog Ana-
lyst may not be necessary.
In this section we will see how to use the Encog Workbench without the
Encog Analyst by creating a simple XOR neural network. The XOR dataset
does not require any normalization as itis already in the 0 to 1 range.
We will begin by creating a new project.
3.2.1 Creating a New Project
First create a new project by launching the Encog Workbench. Once the
Encog Workbench starts up, the options of creating a new project, opening an
existing project or quitting will appear. Choose to create a new project and
name it “XOR.” This will create a new empty folder named XOR. You will
now see the Encog Workbench in Figure 3.1.
40 The Encog Workbench
Figure 3.1: The Encog Workbench
This is the basic layout of the Encog Workbench. There are three main
areas. The tall rectangle on the left is where all project files are shown. Cur-
rently this project has no files. You can also see the log output and status
information. The rectangle just above the log output is where documents are
opened. The look of the Encog Workbench is very much like IDE and should
be familiar to developers.
3.2.2 Generate Training Data
The next step is to obtain training data. There are several ways to do this.
First, Encog Workbench supports drag and drop. For instance, CSVs can be
dragged from the operating system and dropped into the project as a copy,
leaving the original file unchanged. These files will then appear in the project
tree.
The Encog Workbench comes with a number of built-in training sets. Ad-
ditionally, it can download external data such as stock prices and even sunspot
information. The sunspot information can be used for time-series prediction
experiments.
The Encog Workbench also has a built-in XOR training set. To access it,
choose Tools->Generate Training Data. This will open the “Create Training
Data” dialog. Choose “XOR Training Set” and name it “xor.csv. Your new
CSV file will appear in the project tree.
3.2 A Simple XOR Example 41
If you double-click the “xor.csv” file, you will see the following training
data in Listing 3.1:
Listing 3.1: XOR Training Data
” op 1 ” , ” op 2 ” , r e s u l t
0 ,0 , 0
1 ,0 , 1
0 ,1 , 1
1 ,1 , 0
It is important to note that the file does have headers. This must be specified
when the EGB file is generated.
3.2.3 Create a Neural Network
Now that the training data has been created, a neural network should be
created learn the XOR data. To create a neural network, choose “File->New
File. Then choose “Machine Learning Method” and name the neural network
“xor.eg. Choose “Feedforward Neural Network. This will display the dialog
shown in Figure 3.2:
Figure 3.2: Create a Feedforward Network
Make sure to fill in the dialog exactly as above. There should be two
input neurons, one output neuron and a single hidden layer with two neurons.
Choose both activation functions to be sigmoid. Once the neural network is
created, it will appear on the project tree.
42 The Encog Workbench
3.2.4 Train the Neural Network
It is now time to train the neural network. The neural network that you see
currently is untrained. To easily determine if the neural network is untrained,
double-click the EG file that contains the neural network. This will show
Figure 3.3.
Figure 3.3: Editing the Network
This screen shows some basic stats on the neural network. To see more
detail, select the “Visualize” button and choose “Network Structure. This
will show Figure 3.4.
Figure 3.4: Network Structure
The input and output neurons are shown from the structure view. All
of the connections between with the hidden layer and bias neurons are also
3.2 A Simple XOR Example 43
visible. The bias neurons, as well as the hidden layer, help the neural network
to learn.
With this complete, it is time to actually train the neural network. Begin
by closing the histogram visualization and the neural network. There should
be no documents open inside of the workbench.
Right-click the “xor.csv” training data. Choose “Export to Training (EGB).
Fill in two input neurons and one output neuron on the dialog that appears.
On the next dialog, be sure to specify that there are headers. Once this is
complete, an EGB file will be added to the project tree. This will result in
three files: an EG file, an EGB file and a CSV file.
To train the neural network, choose “Tools->Train. This will open a dialog
to choose the training set and machine learning method. Because there is only
one EG file and one EGB file, this dialog should default to the correct values.
Leave the “Load to Memory” checkbox clicked. As this is such a small training
set, there is no reason to not load to memory.
There are many different training methods to choose from. For this exam-
ple, choose “Propagation - Resilient. Accept all default parameters for this
training type. Once this is complete, the training progress tab will appear.
Click “Start” to begin training.
Training will usually finish in under a second. However, if the training
continues for several seconds, the training may need to be reset by clicking
the drop list titled “<Select Option>. Choose to reset the network. Because
a neural network starts with random weights, training times will vary. On a
small neural network such as XOR, the weights can potentially be bad enough
that the network never trains. If this is the case, simply reset the network as
it trains.
3.2.5 Evaluate the Neural Network
There are two ways to evaluate the neural network. The first is to simply
calculate the neural network error by choosing “Tools->Evaluate Network.
You will be prompted for the machine learning method and training data to
use. This will show you the neural network error when evaluated against the
specified training set.
44 The Encog Workbench
For this example, the error will be a percent. When evaluating this per-
cent, the lower the percent the better. Other machine learning methods may
generate an error as a number or other value.
For a more advanced evaluation, choose “Tools->Validation Chart. This
will result in an output similar to Figure 3.5.
Figure 3.5: Validation Chart for XOR
3.3 Using the Encog Analyst 45
This graphically depicts how close the neural network’s computation matches
the ideal value (validation). As shown in this example, they are extremely
close.
3.3 Using the Encog Analyst
In the last section we used the Workbench with a simple data set that did not
need normalization. In this section we will use the Encog Analyst to work with
a more complex data set - the iris data set that has already been demonstrated
several times. The normalization procedure is already explored. However, this
will provide an example of how to normalize and produce a neural network for
it using the Encog Analyst
The iris dataset is built into the Encog Workbench, so it is easy to create
a dataset for it. Create a new Encog Workbench project as described in the
previous section. Name this new project “Iris. To obtain the iris data set,
choose “Tools->Generate Training Data. Choose the “Iris Dataset” and name
it “iris.csv.
Right-click the “iris.csv” file and choose “Analyst Wizard. This will bring
up a dialog like Figure 3.6.
Figure 3.6: Encog Analyst Wizard
You can accept most default values. However, “Target Field” and “CSV
File Headers” fields should be changed. Specify “species” as the target and
indicate that there are headers. The other two tabs should remain unchanged.
Click “OK” and the wizard will generate an EGA file.
46 The Encog Workbench
This exercise also gave the option to show how to deal with missing values.
While the iris dataset has no missing values, this is not the case with every
dataset. The default action is to discard them. However, you can also choose
to average them out.
Double click this EGA file to see its contents as in Figure 3.7.
Figure 3.7: Edit an EGA File
From this tab you can execute the EGA file. Click “Execute” and a status
dialog will be displayed. From here, click “Start” to begin the process. The
entire execution should take under a minute on most computers.
Step 1: Randomize - Shuffle the file into a random order.
Step 2: Segregate - Create a training data set and an evaluation data
set.
Step 3: Normalize - Normalize the data into a form usable by the selected
Machine Learning Method
Step 4: Generate - Generate the training data into an EGB file that can
be used to train.
Step 5: Create - Generate the selected Machine Learning Method.
Step 6: Train - Train the selected Machine Learning Method.
Step 7: Evaluate - Evaluate the Machine Learning Method.
3.3 Using the Encog Analyst 47
This process will also create a number of files. The complete list of files, in
this project is:
iris.csv - The raw data.
iris.ega - The EGA file. This is the Encog Analyst script.
iris eval.csv - The evaluation data.
iris norm.csv - The normalized version of iris train.csv.
iris output.csv - The output from running iris eval.csv.
iris random.csv - The randomized output from running iris.csv.
iris train.csv - The training data.
iris train.eg - The Machine Learning Method that was trained.
iris train.egb - The binary training data, created from iris norm.egb.
If you change the EGA script file or use different options for the wizard, you
may have different steps.
To see how the network performed, open the iris output.csv file. You will
see Listing 3.2.
Listing 3.2: Evaluation of the Iris Data
s e p a l l ” , ” s e p al w , ” p e t a l l ” , p e t al w ” , ” s p e c i e s , ” Ou tpu t : s p e c i e s ”
6 . 5 , 3 . 0 , 5 . 8 , 2 . 2 , I r i s v i r g i n i c a , I r i s virginica
6 . 2 , 3 . 4 , 5 . 4 , 2 . 3 , I r i s v i r g i n i c a , I r i s virginica
7 . 7 , 3 . 0 , 6 . 1 , 2 . 3 , I r i s v i r g i n i c a , I r i s virginica
6 . 8 , 3 . 0 , 5 . 5 , 2 . 1 , I r i s v i r g i n i c a , I r i s virginica
6 . 5 , 3 . 0 , 5 . 5 , 1 . 8 , I r i s v i r g i n i c a , I r i s virginica
6 . 3 , 3 . 3 , 4 . 7 , 1 . 6 , I r i s versicolor , Iris versicolor
5 . 6 , 2 . 9 , 3 . 6 , 1 . 3 , I r i s versicolor , Iris versicolor
. . .
This illustrates how the neural network attempts to predict what iris species
each row belongs to. As you can see, it is correct for all of the rows shown
here. These are data items that the neural network was not originally trained
with.
48 The Encog Workbench
3.4 Encog Analyst Reports
This section will discuss how the Encog Workbench can also produce several
Encog Analyst reports. To produce these reports, open the EGA file as seen
in Figure 3.7. Clicking the “Visualize” button gives you several visualization
options. Choose either a “Range Report” or “Scatter Plot. Both of these are
discussed in the next sections.
3.4.1 Range Report
The range report shows the ranges of each of the attributes that are used to
perform normalization by the Encog Analyst. Figure 3.8 shows the beginning
of the range report.
Figure 3.8: Encog Analyst Range Report
This is only the top portion. Additional information is available by scrolling
down.
3.4.2 Scatter Plot
It is also possible to display a scatter plot to view the relationship between two
or more attributes. When choosing to display a scatter plot, Encog Analyst
will prompt you to choose which attributes to relate. If you choose just two,
you are shown a regular scatter plot. If you choose all four, you will be shown
a multivariate scatter plot as seen in Figure 3.9.
3.5 Summary 49
Figure 3.9: Encog Analyst Multivariate Scatter Plot Report
This illustrates how four variables relate. To see how to variables relate,
choose two squares on the diagonal. Follow the row and column on each and
the square that intersects is the relationship between those two attributes. It
is also important to note that the triangle formed above the diagonal is the
mirror image (reverse) of the triangle below the diagonal.
3.5 Summary
This chapter introduced the Encog Workbench. The Encog Workbench is a
GUI application that visually works with neural networks and other machine
learning methods. The workbench is a Java application that produces data
that it works across any Encog platforms.
This chapter also demonstrated how to use Encog Workbench to directly
create and train a neural network. For cases where data is already normalized,
this is a good way to train and evaluate neural networks. The workbench
creates and trains neural networks to accomplish this.
For more complex data, Encog Analyst is a valuable tool that performs
automatic normalization. It also organizes a neural network project as a series
of tasks to be executed. The iris dataset was used to illustrate how to use the
Encog Analyst.
So far, this book has shown how to normalize and process data using the
Encog Analyst. The next chapter shows how to construct neural networks with
code using the Encog framework directly with and without Encog Analyst.
51
Chapter 4
Constructing Neural Networks
in Java
Constructing a Neural Network
Activation Functions
Encog Persistence
Using the Encog Analyst from Code
This chapter will show how to construct feedforward and simple recurrent
neural networks with Encog and how to save these neural networks for later
use. Both of these neural network types are created using the BasicNetwork
and BasicLayer classes. In addition to these two classes, activation functions
are also used. The role of activation functions will be discussed as well.
Neural networks can take a considerable amount of time to train. Because
of this it is important to save your neural networks. Encog neural networks
can be persisted using Java’s built-in serialization. This persistence can also
be achieved by writing the neural network to an EG file, a cross-platform text
file. This chapter will introduce both forms of persistence.
In the last chapter, the Encog Analyst was used to automatically normalize
data. The Encog Analyst can also automatically create neural networks based
on CSV data. This chapter will show how to use the Encog analyst to create
neural networks from code.
52 Constructing Neural Networks in Java
4.1 Constructing a Neural Network
A simple neural network can quickly be created using BasicLayer and Ba-
sicNetwork objects. The following code creates several BasicLayer objects
with a default hyperbolic tangent activation function.
Basi c N e twork networ k = new B as ic N et w or k ( ) ;
net wo rk . add La yer (new Ba s i c La y e r ( 2 ) ) ;
net wo rk . add La yer (new Ba s i c La y e r ( 3 ) ) ;
net wo rk . add La yer (new Ba s i c La y e r ( 1 ) ) ;
net wo rk . ge t S t ru ct ur e ( ) . f i n a l i z e S t r u c t u r e ( ) ;
n et wo rk . r e s e t ( ) ;
This network will have an input layer of two neurons, a hidden layer with
three neurons and an output layer with a single neuron. To use an activation
function other than the hyperbolic tangent function, use code similar to the
following:
Basi c N e twork networ k = new B as ic N et w or k ( ) ;
net wo rk . add La yer (new BasicLayer( null ,t rue , 2 ) ) ;
net wo rk . add La yer (new BasicLayer(new ActivationSigmoid ( ) , true , 3 ) ) ;
net wo rk . add La yer (new BasicLayer(new ActivationSigmoid ( ) , false , 1 ) ) ;
net wo rk . ge t S t ru ct ur e ( ) . f i n a l i z e S t r u c t u r e ( ) ;
n et wo rk . r e s e t ( ) ;
The sigmoid activation function is passed to the AddLayer calls for the hidden
and output layer. The true value that was also introduced specifies that the
BasicLayer should have a bias neuron. The output layer does not have bias
neurons, and the input layer does not have an activation function. This is
because the bias neuron affects the next layer, and the activation function
affects data coming from the previous layer.
Unless Encog is being used for something very experimental, always use a
bias neuron. Bias neurons allow the activation function to shift off the origin
of zero. This allows the neural network to produce a zero value even when
the inputs are not zero. The following URL provides a more mathematical
justification for the importance of bias neurons:
http://www.heatonresearch.com/wiki/Bias
Activation functions are attached to layers and used to scale data output
from a layer. Encog applies a layer’s activation function to the data that
4.2 The Role of Activation Functions 53
the layer is about to output. If an activation function is not specified for
BasicLayer, the hyperbolic tangent activation will be defaulted.
It is also possible to create context layers. A context layer can be used to
create an Elman or Jordan style neural networks. The following code could
be used to create an Elman neural network.
Bas i c L a y er i n p u t , hidden ;
Basi c N e twork networ k = new B as ic N et w or k ( ) ;
net wo rk . add La yer ( i np ut = new Ba s i c L ay e r ( 1 ) ) ;
net wo rk . add La yer ( hidden = new B a s i c La y e r ( 2 ) ) ;
net wo rk . add La yer (new Ba s i c La y e r ( 1 ) ) ;
inp ut . setCon textFed By ( hidden ) ;
net wo rk . ge t S t ru ct ur e ( ) . f i n a l i z e S t r u c t u r e ( ) ;
n et wo rk . r e s e t ( ) ;
Notice the hidden.setContextFedBy line? This creates a context link from
the output layer to the hidden layer. The hidden layer will always be fed the
output from the last iteration. This creates an Elman style neural network.
Elman and Jordan networks will be introduced in Chapter 7.
4.2 The Role of Activation Functions
The last section illustrated how to assign activation functions to layers. Ac-
tivation functions are used by many neural network architectures to scale the
output from layers. Encog provides many different activation functions that
can be used to construct neural networks. The next sections will introduce
these activation functions.
Activation functions are attached to layers and are used to scale data out-
put from a layer. Encog applies a layer’s activation function to the data that
the layer is about to output. If an activation function is not specified for Ba-
sicLayer, the hyperbolic tangent activation will be the defaulted. All classes
that serve as activation functions must implement the ActivationFunction
interface.
Activation functions play a very important role in training neural networks.
Propagation training, which will be covered in the next chapter, requires than
an activation function have a valid derivative. Not all activation functions
54 Constructing Neural Networks in Java
have valid derivatives. Determining if an activation function has a derivative
may be an important factor in choosing an activation function.
4.3 Encog Activation Functions
The next sections will explain each of the activation functions supported by
Encog. There are several factors to consider when choosing an activation func-
tion. Firstly, it is important to consider how the type of neural network being
used dictates the activation function required. Secondly, consider the neces-
sity of training the neural network using propagation. Propagation training
requires an activation function that provides a derivative. Finally, consider
the range of numbers to be used. Some activation functions deal with only
positive numbers or numbers in a particular range.
4.3.1 ActivationBiPolar
The ActivationBiPolar activation function is used with neural networks that
require bipolar values. Bipolar values are either true or false. A true value
is represented by a bipolar value of 1; a false value is represented by a bipolar
value of -1. The bipolar activation function ensures that any numbers passed
to it are either -1 or 1. The ActivationBiPolar function does this with the
following code:
i f ( d [ i ] >