Content uploaded by Amir H. Alavi
Author content
All content in this area was uploaded by Amir H. Alavi on May 22, 2017
Content may be subject to copyright.
1
Expression Programming Techniques for Formulation of Structural Engineering Systems
Amir H. Gandomi
Department of Civil Engineering, University of Akron, Akron, OH 44325-3905, USA
Amir H. Alavi
School of Civil Engineering, Iran University of Science and Technology, Tehran, Iran
Abstract
Modelling the real behavior of the structural systems is very difficult because of the multivariable
dependencies of materials and structural responses. To deal with this complex behavior, simplifying
assumptions are commonly incorporated into the development of the conventional methods. This may
lead to very large errors. The present study investigates the simulation capabilities of expression
programming (EP) techniques by applying them to complex structural engineering problems. Gene
expression programming (GEP) and multi expression programming (MEP) are the employed EP
systems. Compared with traditional genetic programming (GP), the EP techniques are more
compatible with computer architectures. This results in a significant speedup in their execution. GEP
and MEP are substantially useful in deriving empirical models for characterizing the behavior of
structural engineering systems by directly extracting the knowledge contained in the experimental
data. The problems analyzed herein include the following: (i) prediction of shear strength of
reinforced concrete columns, and (ii) prediction of hysteretic energy demand in steel moment resisting
frames. The results obtained by GEP and MEP are compared with those provided by other equations
presented in the literature and found to be more accurate. The new approaches of GEP and MEP
overcome the shortcomings of different methods previously presented in the literature for the analysis
of structural engineering systems. Contrary to artificial neural networks and many other soft
computing tools, GEP and MEP provide reasonably simplified prediction equations. The derived
equations can be used for routine design practice. Unlike the conventional methods, GEP and MEP do
not require any simplifying assumptions in developing the models.
Keywords Data mining, Structural engineering, Expression programming, Prediction.
2
1. Introduction
In contrast to other civil engineering problems, many of structural engineering systems lack a precise
analytical theory or model for their solutions. In order to cope with the complexity of structural
engineering problems, and the spatial variability of the involved materials, traditional forms of
engineering design solutions have widely been developed. The information has usually been
collected, synthesized and presented in the form of design charts, tables or empirical formulae (Shahin
et al., 2001; Alavi and Gandomi, 2011a).
Recent technological progress has generated extremely accurate and reliable computer-aided pattern
recognition and data classification methods. Pattern recognition systems, as an example, learn
adaptively from experience and extract various discriminators. Artificial neural networks (ANNs) are
the most widely used pattern recognition procedures. ANNs have been successfully employed to
capture nonlinear interactions between various parameters in complex structural engineering systems
(Juang et al., 2001; Javadi, 2006; Cabalar and Cevik, 2009, Alavi et al., 2009, Alavi and Gandomi
2011b, Gandomi and Alavi 2011b). Despite the acceptable performance of ANNs in most cases, they
have some fundamental disadvantages that limit them to be used by several researchers. There is no
definite function in ANN methodology to be used for the calculation of output. ANN only has final
synaptic weights to obtain outcome in parallel manner. However, due to nonlinearity and complexity,
more robust tools are required to assess the behavior of many structural engineering problems.
Genetic programming (GP) (Koza, 1992) is a new alternative approach to overcome the limitations of
ANNs. GP is a specialization of genetic algorithms (GA) where solutions are computer programs
rather than binary strings. One of the main features of GP over other soft computing tools (e.g. ANNs)
is its ability to generate simplified prediction equations without assuming prior form of the existing
relationship (Alavi et al., 2011a). For the last decade, GP and its variants have been pronounced as
powerful methods for simulating the behavior of civil engineering problems (Narendra et al., 2006;
Javadi et al., 2006; Gandomi and Alavi, 2011, 2012a,b; Alavi et al. 2011c,2012; Gandomi et al.
2010a,b,c,2011d). Gene expression programming (GEP) (Ferreira, 2001) is another recent extension
to GP that evolves computer programs of different sizes and shapes encoded in linear chromosomes of
fixed length. The GEP chromosomes are composed of multiple genes, each gene encoding a smaller
3
subprogram. Multi expression programming (MEP) (Oltean and Dumitrescu, 2002) is another linear
variant of GP that uses a linear representation of chromosomes. MEP has a special ability to encode
multiple computer programs of a problem in a single chromosome. Based on numerical experiments,
the GEP and MEP approaches are able to significantly outperform similar techniques (Oltean and
Grosşan, 2003). In contrast with classical GP and ANNs, application of GEP and MEP in the field of
structural engineering is new and original (Cevik 2007; Gesoglu et al., 2009; Cevik et al., 2009;
Baykasoglu et al., 2009; Cevik et al., 2010; Gandomi et al., 2011c,d).
This study investigates the potential of GEP and MEP in simulating the nonlinear complex behavior
of structural engineering systems. In order to demonstrate the formulation capabilities of these EP
techniques, they are applied to two practical examples of structural engineering. A comparative study
between the proposed formulation results as well as the existing models in the literature is conducted.
The GEP and MEP models are developed based on reliable experimental results collected through an
extensive literature review.
2. Genetic programming
GP is a symbolic optimization technique that creates computer programs to solve a problem through
simulating the biological evolution of living organisms (Koza, 1992). The main difference between
GP and GA is the representation of the solution. GA creates a string of numbers that represent the
solution. The classical GP solutions are computer programs represented as tree structures and
expressed in a functional programming language (such as LISP) (Koza, 1992; Alavi et al., 2011a).
The fitness of each program generated by GP is evaluated using a fitness function. Thus, the fitness
function is the objective function that GP aims to optimize (Gandomi et al., 2011b). In addition to
classical tree-based GP, there are other types of GP where programs are represented in different ways.
These are linear and graph-based GP (Banzhaf et al., 1998; Alavi and Gandomi, 2011). The emphasis
of the present study is placed on the linear-based GP techniques.
4
2.1. Expression programming
Computers do not naturally run tree-shaped programs. Therefore, slow interpreters have to be used as
a part of classical tree-based GP. Conversely, by evolving the binary bit patterns, the use of an
expensive interpreter is avoided. Consequently, a linear GP system can run several orders of
magnitude faster than comparable interpreting systems. The enhanced speed of the linear variants of
GP (e.g., LGP and MEP) permits conducting many runs in realistic timeframes. This leads to deriving
consistent and high-precision models with little customization (Francone and Deschaine, 2004;
Gandomi et al., 2011b).
Several linear variants of GP have recently been proposed such as gene expression programming
(GEP) (Ferreira, 2001) and multi-expression programming (MEP) (Oltean and Dumitrescu, 2002). EP
techniques are the most common linear-based GP methods. These variants make a clear distinction
between the genotype and the phenotype of an individual (Oltean and Grosşan, 2003a).
GEP is a natural development of GP first proposed by Ferreira (2001). GEP consists of five main
components (Alavi and Gandomi, 2011a): function set, terminal set, fitness function, control
parameters, and termination condition. GEP uses a fixed length of character strings to represent
solutions to the problems, which are afterwards expressed as parse trees of different sizes and shapes.
These trees are called GEP expression trees (ETs) (Alavi and Gandomi, 2011a). One advantage of the
GEP technique is that the creation of genetic diversity is extremely simplified as genetic operators
work at the chromosome level. The multi-genic nature of GEP allows the evolution of more complex
programs composed of several subprograms. Each GEP gene contains a list of symbols with a fixed
length that can be any element from a function set like {+,-, ×, /, Log} and the terminal set like {a, b,
c, 1}. The function set and terminal set must have the closure property: each function must able to
take any value of data type which can be returned by a function or assumed by a terminal (Alavi and
Gandomi, 2011a).
MEP is another subarea of GP. It was first introduced by Oltean and Dumitrescu (2002). Linear
chromosomes are used by MEP for solution encoding. This technique encodes multiple computer
programs in a single chromosome. A program with the best fitness represents the chromosome. The
MEP decoding process is not more complicated than other GP variants storing a single program in a
5
chromosome (Alavi et al., 2010a). The steady-state algorithm of MEP starts by the creation of a
random population of computer programs (Oltean and Grossan, 2003a,b; Alavi and Gandomi 2012).
The representation of the MEP solutions is similar to the procedure followed by C and Pascal to
convert expressions into machine code. Functions and terminals are a part of a population member
created by MEP. The terminal and function symbols are elements in the terminal and function sets,
respectively. A function set can contain the basic arithmetic operations or any other mathematical
functions. The terminal set can contain numerical constants, logical constants and variables. Each
gene encodes a terminal or a function symbol. The first symbol in a chromosome is a terminal
symbol. The best expression is selected after controlling the fitness of all expression in an MEP
chromosome using a fitness function (Oltean and Grossan, 2003a).
Comprehensive details about GEP and MEP can be found in (Alavi and Gandomi, 2011a).
3. Application to structural engineering problems
3.1. Review of state of art
Applications of the GEP and MEP techniques to solve problem in structural engineering is quite new.
Cevik (2007) and Pala (2008) used GEP for characteristic modeling of cold-formed steel. GEP is
utilized by Gesoglu et al. (2009) for mechanical modeling of concrete. Baykasoglu et al. (2009)
proposed prediction models for high-strength concrete parameters via GEP. Cevik et al. (2010)
proposed a GEP-based formula for predicting torsional strength of RC beams. Gandomi et al. (2011d)
utilized GEP to develop a new prediction model for the load capacity of castellated steel beams.
Gandomi et al. (2012) recently used GEP to relates the concrete triaxial strength to mix design
parameters. Although Oltean and Grosşan (2003a) have shown that MEP algorithm can outperform
LGP and GEP, it has been rarely used for engineering modeling (e.g. Alavi et al., 2010a). Based on an
extensive literature review, the only application of MEP to structural engineering problems is done by
Gandomi et al. (2011c) for the modeling of uplift capacity of suction caisson. The uplift capacity of
suction caisson is the only structural system that has been modeled using different metaheuristic
algorithms. Therefore, it can provide a sound basis for conducting a brief comparative study. . The
uplift capacity problem is solved using finite element method (FEM) (Deng and Carter, 1999), ANN
6
(Rahman et al., 2001), neuro genetic network (NGN) (Pai, 2005), evolutionary polynomial regression
(EPR) (Rezania et al., 2008), hybrid GP and simulated annealing (GP/SA), least squares regression
(LSR) (Alavi et al., 2010b), classical tree-based GP, LGP and GEP (Alavi et al. 2011b). All of these
studies have used the same database. Table 1 summarizes the results obtained by different methods
for the prediction of the uplift capacity. Correlation coefficient (R), root mean squared error (RMSE),
mean absolute error (MAE), mean of experimental to predicted ratio, standard deviation (SD) and
covariance (Cov) are used to evaluate the performance of the models.
As can be observed from this table, the FEM and least square regression (LSR) have the best and
worst performance, respectively. Among 8 different metaheuristic algorithms, MEP has the best
performance. The performance of MEP is comparable with FEM. Further, the GEP algorithm has a
very good performance and outperforms LSR, NGN and tree-based GP.
Table 1. Performance statistics of different methods for the prediction of the uplift capacity of suction
caisson.
Method
Experimental vs Predicted
Experimental / Predicted
R
MAE
RMSE
Mean
SD
Cov.
FEM
0.996
8.49
11.88
0.910
0.206
0.226
LSR
0.871
33.55
51.36
0.826
0.299
0.363
Metaheuristic Algorithms
ANN
0.986
11.01
17.77
0.980
0.261
0.266
NGN
0.922
34.66
48.76
0.844
0.492
0.583
EPR
0.995
10.81
15.95
0.970
0.207
0.213
GP/SA
0.998
8.723
11.122
0.931
0.130
0.140
Tree-based GP
0.988
18.19
13.90
1.369
1.804
1.317
LGP
0.997
11.60
14.92
0.927
0.199
0.215
GEP
0.991
17.23
24.87
0.922
0.256
0.278
MEP
0.996
9.78
16.19
0.892
0.166
0.187
3.2 Numerical experiments
This chapter investigates the applicability of using the GEP and MEP approaches to formulate
structural engineering problems. The investigated problems are:
i. Prediction of load capacity of reinforced concrete (RC) columns;
ii. Prediction of hysteretic energy demand in steel moment resisting frames;
7
The GEP and MEP models were developed based on experimental results obtained from the literature.
Various parameters involved in the GEP and MEP algorithms are presented in Table 2. The major
task is to define the hidden function connecting the input variables and output variables. There are
thirteen parameters for GEP and eight parameters for MEP. The former five parameters are similar for
each algorithm. The parameter selection will affect the generalization capability of the GEP and MEP
models. Several runs are conducted to come up with a parameterization of GEP and MEP that
provided enough robustness and generalization to solve the problems. The effective training time
specifies the number of generations in GEP and MEP. For all the cases, three levels are set for the
number of generations. A fairly large number of generations are tested on each run to find models
with minimum error. For each case, the program is run until there is no longer significant
improvement in the performance of the models or the runs is terminated automatically. Three levels
are set for the population size. Large populations are used with the runs to guarantee sufficient
diversity. Note that a run will take longer with a larger population size. Two levels are considered for
the crossover and mutation rates. The success of the algorithms usually increases with increasing the
head size and number of genes in GEP and chromosome length in MEP. In this case, the complexity
of the evolved functions increases and the speeds of the algorithms decrease. Different optimal levels
are considered for these parameters as tradeoffs between the running time and the complexity of the
evolved solutions. Basic arithmetic operators and mathematical functions are utilized to get the
optimum models. The values considered for the other parameters are based on both some previously
suggested values (e.g., Baykasoglu et al., 2008; Alavi and Gandomi, 2011) and after making several
preliminary runs and observing the performance behavior. All of the combinations of the parameters
are tested and 10 replications are carried out for each combination.
Table 2. Parameter settings for the GEP and MEP algorithms.
Algorithms
Parameters
Parameter setting
Common Parameters
GEP,
MEP
Number of generation
100, 250, 500
Population size
500, 2500, 5000
Function set
+, -, ×, /, √, power, exp,
log, ln
Mutation rate (%)
10, 90
Fitness function
Linear error function
Algorithm specific
MEP
Crossover rate (%)
50, 95
8
parameters
Crossover type
Uniform
Chromosome length
50-80 genes
GEP
Number of genes
1- 3
Head size
3, 5, 8
Linking function
+
One-point recombination rate
(%)
30, 50
Two-points recombination
rate (%)
30
Gene recombination rate
10
Gene transposition rate (%)
10
Numerical constants
Integer, Floating-point
The GEP algorithm is implemented by GeneXproTools (GEPSOFT, 2006) software. The source code
of MEP (Oltean, 2004) in C++ is modified by the authors to be utilizable for the available problems.
Overfitting is one of the essential problems in machine learning generalization. An efficient approach
to prevent overfitting is to test the derived models on a validation set to find a better generalization
(Banzhaf et al. 1998). This strategy is considered in this study for improving the generalization of the
models. For this aim, the available data sets are randomly divided into learning, validation and testing
subsets. The learning data are taken for training (genetic evolution). The validation data are used to
specify the generalization capability of the models on data they did not train on (model selection).
Thus, both of the learning and validation data are involved in the modeling process and are
categorized into one group referred to as “training data”. The model with the best performance on
both of the learning and validation data sets is finally selected as the outcomes of the runs. The testing
data sets are further employed to measure the performance of the optimal models obtained by GEP
and MEP on data without role in building the models. To obtain a consistent data division, several
combinations of the training and testing sets are considered. The selection was in a way that the
maximum, minimum, mean and standard deviation of parameters are consistent in the training and
testing data sets. Out of the available data for each problem, approximately 85% are used for the
training purpose (70% for learning and 15% for validation) and 15% for the testing of the
generalization capability of the GEP and MEP models.
3.3. Model selection
9
To obtain the best prediction models, the best GEP and MEP-based formulas are chosen on the basis
of a multi-objective strategy as below:
i. The simplicity of the model, although this is not a predominant factor.
ii. Providing the best fitness value on the learning set of data.
iii. Providing the best fitness value on a validation set of data.
The first objective must be controlled by the user. For the other objectives, the following objective
function (OBJ) is used as a measure of how well the model predicted output agrees with the
experimentally measured output. The selections of the best models are deduced by the minimization
of the following function:
22 .
.2
.
..
Validation
ValidationValidation
Training
Validation
Learning
LearningLearning
Training
ValidationLearning
R
MAERMSE
No
No
R
MAERMSE
No
NoNo
OBJ
(1)
where No.Train, No.Learning, and No.Validation are the number of training, learning and validation data,
respectively. It is well known that only R is not a good indicator of prediction accuracy of a model.
This is because that by shifting the output values of a model equally, the R value will not change. The
constructed objective function takes into account the changes of R, RMSE and MAE together. Higher
R values and lower RMSE and MAE values result in lowering OBJ and, consequently, indicate a
more precise model. In addition, the above function considers the effects of different data divisions
for the training and testing data.
3.4. Prediction problems
3.4.1. A case study of concrete structures: Shear strength of RC columns
Columns are the most important element of RC structures as their failure leads the building to
collapse. The significant effect of shear forces on RC columns subjected to lateral load is well-
understood. Shear failure of RC columns reduces the lateral strength of the building and may lead to
rapid strength degradation. Hence, providing precise estimations of the shear strength of RC columns
is an essential consideration in structural design (Choe, 2006). Several analytical solutions are
presented by design codes to determine the shear strength of circular columns. The contributions of
concrete and transverse reinforcement to the shear strength of circular columns are two key
10
components included in these design codes (Caglar, 2009). There is some published information on
the behavioral modeling of RC columns. In this context, ANNs have been applied to the prediction of
the shear strength of circular RC columns (Caglar, 2009). As mentioned above, ANNs have some
fundamental disadvantages that limit their usage in practical calculations. To overcome these
limitations and provide a more robust method, the GEP and MEP approaches are used as alternative
solutions to simulate the characteristics of RC columns. These methods have an advantage that once
the model is trained, it can be used as a quick and accurate tool for evaluating the shear strength of
circular RC columns without the need for any manual testing.
3.4.1.1. Model construction and analysis
The GEP and MEP-based correlations for predicting the shear strength of circular RC columns were
developed using 47 data sets gathered by Choe (2006). The data are from 12 different experimental
studies of RC columns. The database includes the measurements of several mechanical and
geometrical variables. Herein, four influencing parameters are used as the predictor variables based on
a literature review Caglar (2009). The statistics of different input and output parameters involved in
the model development are given in Table 3. ρlfyl, ρwfyh, Ae√fc/1000 and P/Ag are the considered input
variables. ρl and ρw are, respectively, longitudinal and transverse reinforcement ratios in %. fyl, fyh and
fc’ are, respectively, yield stress of longitudinal reinforcement, hoops and concrete compressive
strength in MPa. Ae and Ag are effective and gross cross-sectional areas, respectively. P is the axial
load in kN and the output is the shear force (V) in kN. Of the available data, 38 data sets are used as
the training data and the rest are used for the testing of the proposed models. Further, the proposed
correlations are compared different codes such as ACI-318 (2005), ATC-32 (1996), ASCE-ACI
(1973) and CALTRANS (1996).
Table 3. The variables used in model development.
ρlfyl
ρwfyh
Ae√fc
P/Ag
V
(MPa)
(MPa)
1000
(MPa)
(kN)
Mean
1056.1
197.48
756.86
2.8173
321.08
Standard Error
60.818
20.414
50.240
0.6521
17.334
Median
1155
166.77
650.47
1.4236
321.50
Mode
1395.2
280.5
654.02
0
230
Standard Deviation
374.91
125.84
309.70
4.0200
106.86
11
Sample Variance
140557
15836
95915
16.160
11418
Kurtosis
-1.5969
0.4517
0.4562
2.1352
-0.1795
Skewness
-0.2959
0.7849
1.2164
1.6602
0.1572
Range
1080.8
531.26
1198.1
14.427
486
Minimum
439.2
26
267.55
0
93
Maximum
1520
557.26
1465.6
14.427
579
Confidence Level (95.0%)
123.23
41.362
101.80
1.3213
35.123
The formulations of the shear strength of circular RC columns, V (kN), for the best result by the GEP
and MEP algorithms are as given below:
g
ce
yhwyhwyll
g
ce
yhwyllGEP A
P
fA
fff
A
P
fA
ffV 1000
/7
8757
1
9''
(2)
1000
2/18
10001000
210 '
2
'' ce
yll
gg
yllyhwyhw
ce
yll
ce
MEP
fA
f
A
P
A
P
fff
fA
f
fA
V
(3)
Figure 1 shows a comparison between the experimental values and the values predicted by GEP and
MEP. Table 4 presents the performance of the proposed models on the training and testing data sets.
Performance indices of different models on the entire database are summarized in Table 5. Based on
the values of the performance measures, it can be observed that the GEP and MEP models are able to
predict the target values with high degree of accuracy. Comparing the performance of the proposed
models, it can be seen that the GEP model has produced better results than the MEP-based model. The
results clearly demonstrate that the ACI-318, ATC-32, ASCE-ACI and CALTRANS methods are not
efficient in estimating the shear strength of circular RC columns.
Figure 1. Experimental versus predicted shear strength of circular RC columns by: (a) GEP and (b)
MEP.
-150
0
150
300
450
600
750
0 5 10 15 20 25 30 35 40 45
V (KN)
Sample No.
Experiment
GEP
Residual
(a)
-150
0
150
300
450
600
750
0 5 10 15 20 25 30 35 40 45
V (KN)
Sample No.
Experiment
MEP
Residual
(b)
12
Table 4. Performance statistics of the GEP and MEP models for the assessment of the shear strength
of RC columns.
Method
Experimental vs Predicted
Experimental / Predicted
R
MAE
RMSE
Mean
SD
Cov.
All data sets
GEP
0.9346
30.3284
40.2143
0.9765
0.1373
0.1406
MEP
0.9391
31.6158
40.7709
0.9550
0.1509
0.1580
Training data sets
GEP
0.9330
29.5420
40.0555
0.9786
0.1391
0.1421
MEP
0.9432
29.5311
38.7068
0.9606
0.1406
0.1464
Testing data sets
GEP
0.9397
35.3094
41.2053
0.9630
0.1367
0.1419
MEP
0.9143
44.8189
51.9736
0.9196
0.2185
0.2377
Table 5. Overall performance of different models for the assessment of the shear strength of RC
columns.
Method
Experimental vs Predicted
Experimental / Predicted
R
MAE
RMSE
Mean
SD
Cov.
Standard Codes
ACI-318
0.6950
70.0659
100.3677
0.9477
0.2274
0.2399
ATC-32
0.6314
74.1205
109.3941
1.0152
0.2811
0.2769
ASCE-ACI
0.7320
79.9432
114.0106
0.9334
0.2529
0.2709
CALTRANS
0.6186
75.5386
105.1456
1.1725
0.6032
0.5144
Expression Programming
GEP
0.9346
30.3284
40.2143
0.9765
0.1373
0.1406
MEP
0.9391
31.6158
40.7709
0.9550
0.1509
0.1580
3.4.2. A case study of steel structures: Hysteretic energy demand in steel moment frames
Energy-based parameters have widely been used as seismic design parameters (Housner, 1956). An
efficient way to analyze damage of a structure is to evaluate the amount of energy imparted to the
structure during an earthquake. This energy is called the total energy input (EI). Hysteretic energy or
hysteretic energy demand is a part of the EI which is dissipated through the hysteretic behavior. The
hysteretic energy demand is the source of damage to the structural component. The structural failure
happens when the earthquake induced hysteretic energy demand for a structure is larger than the
13
hysteretic energy dissipation capacity of the structure. The hysteretic energy and its distribution
throughout the structure are dependent on both the structural systems and the ground motion.
Therefore, the hysteretic energy parameter can be regarded as a design parameter, in particular, when
the damage is expected not to exceed some specified limits (Bertero and Teran-Gilmore, 1994). One
of the major concerns for assessing the response of structures for low-performance levels in
performance-based earthquake resistant design is to determine the hysteretic energy demand and
dissipation capacity and level of damage of the structure to a predefined earthquake ground motion
(Vision, 2000; Riddell and Garcia, 2001).
3.4.2.1. Model construction and analysis
In the proposed GEP and MEP models, earthquake intensity (I) number of stories (Ns), soil type (Z),
fundamental period (T), strength index (η), and hysteretic energy to energy imparted to the structure
ratio (EH/EI) are the variables used to predict the hysteretic energy demand (EH/m). The descriptive
statistics of different input and output parameters involved in the model development are given in
Table 6. The GEP and MEP-based correlations are developed based on 27 data sets presented by
Akbas (2006). Out of the available data, 22 data sets are used as the training data and the rest are used
for the testing of the proposed models. No rational prediction model for the hysteretic energy demand
has been yet developed that would encompass the influencing variables considered in this study.
Therefore, it is not possible to conduct a comparative study between the proposed models and the
existing solutions.
Table 6. The variables used in model development.
I
Ns
Z
T (sec)
η
EH/EI
EH/m (cm/sec) 2
Mean
2
10.667
2
2.3611
0.1293
0.8330
6728.8
Standard Error
0.1601
1.3806
0.1601
0.2225
0.0132
0.0114
1258.5
Median
2
9
2
2.2862
0.11
0.85
4387
Standard Deviation
0.8321
7.1737
0.8321
1.1559
0.0688
0.0593
6539.2
Sample Variance
0.6923
51.462
0.6923
1.3361
0.0047
0.0035
4.3×107
Kurtosis
-1.56
-1.56
-1.56
-1.56
-1.56
0.7545
1.7077
Skewness
0
0.3623
0
0.1047
0.4302
-0.9968
1.5391
Range
2
17
2
2.7754
0.162
0.23
24198
Minimum
1
3
1
1.0109
0.058
0.68
415
Maximum
3
20
3
3.7863
0.22
0.91
24613
14
Confidence Level (95.0%)
0.3291
2.8378
0.3291
0.4573
0.0272
0.0234
2586.8
3.2.1.1. GEP and MEP-based formulations of hysteretic energy
The optimal hysteretic energy prediction equations derived by means of the GEP and MEP algorithms
are as follows:
H
I
s
s
GEP
H
E
ZITE
TTTIN
IN
m
E
2
34124096
(4)
I
H
s
s
I
H
I
H
H
I
MEP
H
E
E
I
N
T
N
ZE
E
E
E
E
E
Z
T
I
m
E4
4
1545
36
(5)
A comparison of the experimental versus predicted EH/m by GEP and MEP is illustrated in Figure 2.
Performance statistics of the GEP and MEP models are summarized in Table 7. The results indicate
that the GEP and MEP models are able to predict hysteretic energy with high degree of accuracy. As
it is seen, GEP has produced generally better outcomes than MEP on the testing and entire data.
Figure 2. Measured versus predicted hysteretic energy by: (a) GEP and (b) MEP.
Table 7. Overall performance of EP models for hysteretic energy assessment.
Method
Experimental vs Predicted
Experimental / Predicted
R
MAE
RMSE
Mean
SD
Cov.
All data sets
GEP
0.9698
1602.6
2224.3
0.8902
0.3404
0.3824
MEP
0.9650
1.4021
1.7616
0.8842
0.2816
0.3184
Training data sets
GEP
0.9601
1564.5
2213.9
0.8482
0.3497
0.4123
MEP
0.9619
1381.0
1772.8
0.8500
0.2799
0.3293
Testing data sets
-5000
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
EH/m (cm/sec) 2
Sample No.
Experiment
GEP
Residual
(a)
-5000
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
EH/m (cm/sec) 2
Sample No.
Experiment
MEP
Residual
(b)
15
GEP
0.9978
1770.5
2269.3
1.0752
0.2421
0.2252
MEP
0.9725
1494.6
1711.3
1.0346
0.2630
0.2542
4. Model validity
Smith (1986) suggested the following criteria for evaluating the performance of a model:
• if a model gives correlation coefficient (R) > 0.8, a strong correlation exists between the
predicted and measured values.
In all cases, the error values (e.g., RMSE and MAE) should be at the minimum. The model can
therefore be judged as very good. It can be observed from Tables 4, 5 and 7 that the GEP and MEP
models with high R and low RMSE and MAE values are able to predict the target values to an
acceptable degree of accuracy. Furthermore, new criteria recommended by Golbraikh and Tropsha
(2002) are checked for the external validation of the models on the test data sets. It is suggested that at
least one slope of regression lines (k or k') through the origin should be close to 1. Recently, Roy and
Roy (2008) introduced a confirm indicator (Rm) of the external predictability of models. For Rm > 0.5,
the condition is satisfied. Either the squared correlation coefficient (through the origin) between
predicted and experimental values (Ro2), or the coefficient between experimental and predicted values
(Ro'2) should be close to R2, and to 1. The considered validation criteria and the relevant results
obtained by the models are presented in Table 8. The EP models are considered valid, if they satisfy
the required conditions. The validation phase ensures the derived models are strongly valid and it is
not established by chance.
Table 8. Statistical parameters of the GEP and MEP models.
Case I:
RC columns
Case II:
Hysteretic energy
Item
Formula
Condition
GEP
MEP
GEP
MEP
1
Eq. (5)
0.8 < R
0.9397
0.9143
0.9978
0.9598
2
0.85 < k < 1.15
0.9982
0.9776
1.2101
1.0494
3
0.85 < k' < 1.15
0.9845
0.9953
0.8230
0.9365
4
0.5 < Rm
0.9397
0.9143
0.6296
0.7415
where
,
1.0000
0.9962
0.8605
0.9923
2
1
i
h
n
ii
t
i
h
k
2
1
i
t
n
ii
t
i
h
k
)1( 222 RoRRRm
n
iii
n
io
ii
tt
ht
Ro
1
2
1
2
21
i
o
itkh
16
,
0.9973
0.9997
0.8829
0.9870
4. Conclusions
In this paper, two promising expression programming techniques, namely GEP and MEP are
employed for the analysis of complex structural engineering systems. The capabilities of the GEP and
MEP methodologies are illustrated by application to two practical structural engineering problems: (1)
shear strength of RC columns, and (2) hysteretic energy demand in steel moment resisting frames.
Reliable databases from the previously published experimental results are used to develop the models.
The following conclusions can be derived from the results presented in this research:
i. Despite high nonlinearity in the behavior of the investigated structural systems, the proposed
GEP and MEP models give reasonable estimations of the target values. Furthermore, the
proposed models efficiently satisfy the conditions of different criteria considered for their
external validation.
ii. The proposed GEP and MEP models efficiently take into consideration the effects of several
parameters representing the engineering behavior of the structural problems.
iii. In general, the prediction capabilities of the GEP and MEP models are found to be very close
to each other. The results clearly demonstrate that the GEP and MEP techniques can reliably
be applied to formulate the structural engineering problems.
iv. GEP and MEP provide prediction equations that are relatively short, simple and can be used
for routine design practice via hand calculations.
v. Unlike the traditional methods, GEP and MEP do not require any simplifying assumptions in
developing the models.
vi. The GEP and MEP methods are particularly practical for the situations where good
experimental data are available, the behavior is too complex, or the conventional constitutive
models are unable to effectively describe various aspects of the behavior.
n
iii
n
io
ii
hh
th
oR
1
2
1
2
21
i
o
ihkt
17
References
Alavi A.H., Gandomi A.H., 2012. “Energy-Based Models for Assessment of Soil Liquefaction.”
Geoscience Frontiers, Elsevier, 3(4): 541–555.
Akbas, B. (2006). “A neural network model to assess the hysteretic energy demand in steel moment
resisting frames”, Struct. Eng. Mech., 23(2).
Alavi A.H., Aminian P., Gandomi A.H., Arab Esmaeili M., “Genetic-Based Modeling of Uplift
Capacity of Suction Caissons.” Expert Systems With Applications, Elsevier, 38(10): 12608-
12618, 2011b.
Alavi A.H., Gandomi A.H., Mousavi M., Mollahasani A., “High-Precision Modeling of Uplift
Capacity of Suction Caissons Using a Hybrid Computational Method.” Geomechanics and
Engineering, Techno Press, 2(4): 253-280, 2010b.
Alavi A.H., Gandomi A.H., Sahab M.G., Gandomi M., “Multi Expression Programming: A New
Approach to Formulation of Soil Classification.” Engineering with Computers, 26 (2): 111-118,
2010.
Alavi, A. H. and Gandomi, A. H. 2011a. A robust data mining approach for formulation of
geotechnical engineering systems. Engineering Computations 28(3), 242-274.
Alavi A.H., Gandomi A.H., 2011b “Prediction of Principal Ground-Motion Parameters Using a
Hybrid Method Coupling Artificial Neural Networks and Simulated Annealing.” Computers and
Structures, Elsevier, 89 (23-24): 2176-2194.
Alavi, A. H., Ameri, M., Gandomi, A. H., Mirzahosseini, M. R., 2011a. Formulation of flow number
of asphalt mixes using a hybrid computational method. Construction and Building Materials
25(3), 1338-1355.
Alavi, A.H., Gandomi, A.H., Gandomi, M., Sadat Hosseini, S.S. (2009), “Prediction of Maximum
Dry Density and Optimum Moisture Content of Stabilized Soil Using RBF Neural Networks”, The
IES Journal Part A: Civil & Structural Engineering; Vol. 2 No. 2, pp. 98-106.
18
Alavi A.H., Gandomi A.H., Mollahasani A., 2011c “A Genetic Programming-Based Approach for
Performance Characteristics Assessment of Stabilized Soil.” Variants of Evolutionary Algorithms
for Real-World Applications, Springer-Verlag, Berlin, Chapter 9, 343-375,.
Alavi A.H., Gandomi A.H., Bolury J., Mollahasani A., 2012 “Linear and Tree-Based Genetic
Programming for Solving Geotechnical Engineering Problems” Metaheuristics in Water
Resources, Geotechnical and Transportation Engineering, XS Yang et al. (Eds.), Elsevier, In
Press,.
American Concrete Institute (ACI). Building code requirements for structural concrete. ACI
Committee 318, Farmington Hills, Mich.; 2005.
Applied Technology Council. Improved seismic design criteria for California bridges: provisional
recommendation. Report No. ATC-32, Readwood City, Calif.; 1996.
ASCE-ACI. Shear strength of reinforced concrete members ASCE-ACI joint task committee 426. J
Struct Eng 1973;99:1091–187.
Banzhaf, W., Nordin, P., Keller, R. and Francone, F. 1998. Genetic Programming - An Introduction.
On the Automatic Evolution of Computer Programs and its Application. dpunkt/Morgan
Kaufmann.
Baykasoglu A., A. Oztas and E. Ozbay, Prediction and multi-objective optimization of high-strength
concrete parameters via soft computing approaches, Expert Systems with Applications 36 (3)
(2009), pp. 6145–6155.
Baykasoglu, A., Gullub, H., Canakcı, H., Ozbakır, L., 2008. Prediction of compressive and tensile
strength of limestone via genetic programming. Expert Systems with Applications 35(1-2), 111–
23.
Bertero, V.V. and Teran-Gilmore, A. (1994). “Use of energy concepts in earthquake-resistant analysis
and design: Issues and future directions”, Advances in Earthquake Engineering Practice, Short
Course in Structural Engineering, Architectural and Economic Issues, University of California,
Berkeley.
Cabalar, A.F. and Cevik, A. (2009). “Modelling damping ratio and shear modulus of sand-mica
mixtures using neural Networks”, Engineering Geology, Vol. 104, pp. 31- 40.
19
Caglar N, Neural network based approach for determining the shear strength of circular reinforced
concrete columns. Construction and Building Materials 23 (2009) 3225–3232.
CALTRANS Memo to Designers 20-4. Attachment B, earthquake retrofit analysis for single column
bents; 1996.
Cevik A, Arslan MH, Köroğlu MH, 2010. Genetic-programming-based modeling of RC beam
torsional strength. KSCE Journal of Civil Engineering, 14(3), 371-384.
Cevik A. A new formulation for web crippling strength of cold-formed steel sheeting using genetic
programming. Journal of Constructional Steel Research 63 (2007) 867–883
Choe LY. Shear strength of circular reinforced concrete columns. MSc Thesis, The Ohio State
University; 2006.
Deng, W. and Carter, J.P., Analysis of suction caissons subjected to inclined uplift loading. 1999a,
Centre for Geotechnical Research, The University of Sydney.
Ferreira, C. (2001), “Gene expression programming: a new adaptive algorithm for solving problems”,
Complex Systems Vol. 13 No. 2, pp. 87–129.
Ferreira, C. (2006), Gene Expression Programming: Mathematical Modeling by an Artificial
Intelligence (2nd edn). Springer-Verlag: Germany.
Francone, F.D., Deschaine, L.M., 2004. Extending the boundaries of design optimization by
integrating fast optimization techniques with machine–code–based, linear genetic programming.
Information Sciences 161, 99–120.
Gandomi A.H., Alavi A.H., 2011a. Multi-Stage Genetic Programming: A New Strategy to Nonlinear
System Modeling. Information Sciences, 181(23): 5227-5239.
Gandomi A.H., Alavi A.H., 2011b “Applications of Computational Intelligence in Behavior
Simulation of Concrete Materials.” Chapter 9 in Computational Optimization and Applications
in Engineering and Industry, XS Yang & S Koziel (Eds.), Springer SCI, 359, 221-243.
Gandomi A.H., Tabatabaie S.M., Moradian M.H., Radfar A., Alavi A.H., 2011d “A New Prediction
Model for Load Capacity of Castellated Steel Beams.” Journal of Constructional Steel Research,
67(7): 1096-1105.
20
Gandomi, A. H., Alavi, A. H., & Yun, G. J. (2011c). Formulation of uplift capacity of suction
caissons using multi expression programming. KSCE Journal of Civil Engineering, 15(2), 363–
373.
Gandomi A.H., Alavi A.H., Mousavi M., Tabatabaei S.M. 2011d “A Hybrid Computational Approach
to Derive New Ground-Motion Attenuation Models.” Engineering Applications of Artificial
Intelligence, 24(4): 717–732.
Gandomi A.H., Alavi A.H., Sahab M.G., 2010a “New Formulation for Compressive Strength of
CFRP Confined Concrete Cylinders Using Linear Genetic Programming.” Materials and
Structures, 43(7): 963-983.
Gandomi A.H., Alavi A.H., Sahab M.G., Arjmandi P., 2010b “Formulation of Elastic Modulus of
Concrete Using Linear Genetic Programming.” Journal of Mechanical Science and Technology,
24(6): 1011-1017.
Gandomi A.H., Alavi A.H., Arjmandi P., Aghaeifar A., Seyednoor, M., 2010c “Genetic Programming
and Orthogonal Least Squares: A Hybrid Approach to Modeling the Compressive Strength of
CFRP-Confined Concrete Cylinders.” Journal of Mechanics of Materials and Structures,
Mathematical Sciences, 5(5), 735–753.
Gandomi, A.H., Alavi, A.H., Mirzahosseini, R., Moghdas Nejad, F., 2011a. Nonlinear genetic-based
models for prediction of flow number of asphalt mixtures. Journal of Materials in Civil
Engineering ASCE 23(3), 1-18.
Gandomi, A.H., Alavi, A.H., Yun, G.J., 2011b. Nonlinear modeling of shear strength of SFRC beams
using linear genetic programming. Structural Engineering and Mechanics 38(1), 1-25.
Gandomi A.H., Babanajad S.K., Alavi A.H., Farnam Y., 2012. “A Novel Approach to Strength
Modeling of Concrete under Triaxial Compression.” Journal of Materials in Civil Engineering,
in press. [DOI: 10.1061/(ASCE)MT.1943-5533.0000494]
Gandomi A.H., Alavi A.H., 2012a “A New Multi-Gene Genetic Programming Approach to Nonlinear
System Modeling. Part I: Materials and Structural Engineering Problems.” Neural Computing
and Applications, Springer, 21(1): 171-187.
21
Gandomi A.H., Alavi A.H., 2012b “A New Multi-Gene Genetic Programming Approach to Nonlinear
System Modeling. Part II: Geotechnical and Earthquake Engineering Problems.” Neural
Computing and Applications, Springer, 21(1): 189-201.
GEPSOFT. (2006), GeneXproTools Owner’s Manual. Version 4.0. Available online:
http://www.gepsoft.com/
Gesoglu M, Güneyisi E, Özturan T and Özbay E, 2009. Modeling the mechanical properties of
rubberized concretes by neural network and genetic programming. Materials and Structures, 43(1-
2), 31-45.
Golbraikh, A., Tropsha, A., 2002. Beware of q2!. Journal of Molecular Graphics and Modelling 20
(4), 269–76.
Housner, G.W. (1956). “Limit design of structures to resist earthquakes”, Proc. of the First World
Conf. on Earthq. Eng., Berkeley, California, 5-1-5-13.
Javadi, A.A. (2006), “Estimation of air losses in compressed air tunneling using neural network”,
Journal of Tunnelling and Underground Space Technology Vol. 21 No. 1, pp. 9-20.
Javadi, A.A., Rezania, M., Mousavi Nezhad, M., 2006. Evaluation of liquefaction induced lateral
displacements using genetic programming. Computers and Geotechnics 33(4-5), 222-233.
Juang, C.H., Jiang, T., Christopher, R.A. (2001), “Three-dimensional site characterisation: neural
network approach”, Geotechnique Vol. 51 No. 9, pp. 799-809.
Koza, J., 1992. Genetic programming, on the programming of computers by means of natural
selection, MIT Press, Cambridge (MA).
Narendra, B.S., Sivapullaiah, P.V., Suresh, S., Omkar, S.N. 2006. Prediction of unconfined
compressive strength of soft grounds using computational intelligence techniques: A
comparative study. Computers and Geotechnics 33(3), 196-208.
Oltean, M. (2004), Multi Expression Programming source code. Available at: http://
www.mep.cs.ubbcluj.ro/
Oltean, M., Dumitrescu, D. (2002), Multi expression programming. Technical report, UBB-01-2002,
Babeş-Bolyai University, Cluj-Napoca, Romania.
22
Oltean, M., Grosşan, C. (2003a), “A comparison of several linear genetic programming techniques”,
Advances in Complex Systems Vol. 14 No. 4, pp. 1–29.
Oltean, M., Grosşan, C. (2003b), Solving Classification Problems using Infix Form Genetic
Programming. In Intelligent Data Analysis, Berthold M (ed). LNCS 2810, 242–252, Springer-
Verlag, Berlin.
Pai, G.A.V. (2005). "Prediction of uplift capacity of suction caissons using a neuro-genetic network."
Engineering with Computers, Vol. 21, No. 2, pp. 129-139.
Pala M. Genetic programming-based formulation for distortional buckling stress of cold-formed steel
members. Journal of Constructional Steel Research, Volume 64, Issue 12, December 2008, Pages
1495-1504.
Rahman, M.S., Wang, J., Deng, W., and Carter, J.P. (2001). "A neural network model for the uplift
capacity of suction caissons." Computers and Geotechnics, Vol. 28, No. 4, pp. 269-287.
Rezania M, Javadi AA, Giustolisi O. (2008) An evolutionary-based data mining technique for
assessment of civil engineering systems, Engineering Computations, volume 25, no. 5-6, pages
500-517.
Riddell, R. and Garcia, J.E. (2001). “Hysteretic energy spectrum and damage control”, Earthq. Eng.
Struct. Dyn., 30.
Roy, P.P. and Roy, K. 2008. On some aspects of variable selection for partial least squares regression
models”, QSAR & Combinatorial Science 27, 302-313.
Shahin, M.A., Maier, H.R., Jaksa, M.B. (2001), “Artificial neural network applications in
geotechnical engineering”, Australian Geomechanics Vol. 36 No. 1, pp. 49–62.
Smith, G.N., 1986. Probability and statistics in civil engineering. Collins, London.
Vision 2000 Committee. (2000). Structural Engineering Association of California (SEAOC).