Available via license: CC BY 3.0

Content may be subject to copyright.

Journal of Physics: Conference Series

PAPER • OPEN ACCESS

Improve “2SLS” Method by Genetic algorithm with application

To cite this article: Alaa H Sabri and Sabah M Ridha 2019 J. Phys.: Conf. Ser. 1294 032023

View the article online for updates and enhancements.

This content was downloaded from IP address 2.57.71.119 on 25/10/2019 at 13:53

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution

of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Published under licence by IOP Publishing Ltd

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

1

Improve "2SLS" Method by Genetic algorithm with

application

Alaa H Sabri 1and Sabah M Ridha2

1 Al-Muthanna University, College of science ,Department of mathematics and computer

applications, Iraq

2 Baghdad University,College of Administration & Economics,Department of statistics , Iraq

Email : alaa.sabri@gmail.com

Abstract. This paper explore potential power of Genetic Algorithm for optimization by using

new MATLAB ,most of the robust methods based on the idea of sacrificing in one side versus

promotion another, the artificial intelligence mechanisms try to balance sacrifice and

promotion to make the best solutions in a random search technique. In this paper, a new idea

was introduced to improve the estimators of parameters of linear simultaneous equation models

that resulting from the 2SLS method by using a class of genetic algorithm which called binary

genetic algorithm (GA) and better estimates were obtained using two robust different criteria.

Keywords: - 2SLS ,LSEM ,binary genetic algorithm , GA

1. Introduction

Although Simultaneous Equation Models (SEM) have traditionally been used in the economic world,

each equation in an SEM should represent some underlying conditional expectation that has a causal

structure, The relationships between the variables are used to create the model, but these will depend

on the criteria chosen(10) two structural equations fall out of the individual’s optimization problem: one

has work as a function of the exogenous factors (7), demographics, and unobservables; the other has

crime as a function of these same factors, The completeness of the system requires that the number of

equations equal the number of endogenous variables(21). The leading method for estimating

simultaneous equations models is the method of instrumental variables (IV). Therefore, the solution to

the simultaneity problem is essentially the same as the IV solutions to the omitted variables and

measurement error problems. The mechanics of Two-Stage Least Squares (2SLS) are similar because

we specify a structural equation for each endogenous variable; we can immediately see whether

sufficient IVs are available to estimate either equation (8). If [the disturbances appearing in the various

structural equations are] not independently distributed, lagged endogenous variables are not

independent of the current operation of the equation system, which means these variables are not

really predetermined. If these variables are nevertheless treated as predetermined in the 2SLS

procedure, the resulting estimators are not consistent (2).However, in finite samples under certain

situations Even when 2SLS is used, bias remains because an estimate of Reduced from is used since

the true parameters are unknown ( 9) .The three operators. Selection, crossover and mutation, make GA

an important tool for optimization The exploitation and exploration aspects of GAs can be controlled

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

2

almost independently. This provides a lot of flexibility in designing a GA. this methodology is

applicable even in those cases in which we do not know the form of heteroscedasticity and Least

squares methodology is not applicable(13).The MATLAB package comes with sophisticated libraries

for matrix operations, general numeric methods and plotting of data, therefore MATLAB become first

choice of programmer to implement scientific, graphical and mathematical applications and for the

GA implementation (14)and we could use the method of GA successfully in more flexible

circumstances(6)

2-Linear Simultaneous Equation Models(LSEM)

Consider 2 interdependent variables (endogenous variables) which depend on 4 independent variables

(exogenous variables). Suppose that each endogenous variable can be expressed as a linear

combination of the other endogenous variables, the exogenous variables, and white noise that

represents stochastic interference. Thus, let us modify the income–money supply model as follows: (2).

)2....(....................

)1......(....................

2424323121202

1212111212101

ttttt

ttttt uXXYY

uXXYY

Where

Y1= income

Y2= stock of money

X1= investment expenditure

X2= government expenditure on goods and services

The variables X1and X2are exogenous.

The income equation, a hybrid of quantity-theory–Keynesian approaches to income determination,

states that income is determined by money supply, investment expenditure, and

government expenditure. The money supply function postulates that the stock of money is determined

(by the Federal Reserve System) on the basis of the level of income

in addition to the variables already defined, X3= income in the previous time period and X4= money

supply in the previous period. Both X3and X4are predetermined. It can be readily verified that both

Eqs. (1) and (2) are overidentified.

3-Two-Stage Least Squares (2SLS) method

To apply 2SLS, we proceed as follows: In Stage 1 we regress the endogenous variables on all the

predetermined variables in the system. Thus,

)4....(....................

ˆˆˆˆˆˆ

)3......(....................

ˆˆˆˆˆˆ

2424323222121202

1414313212111101

tttttt

tttttt uXXXXY

uXXXXY

A

useful extension of linear regression is the case where y is a linear function of two or more

independent variables(19)

In Stage 2 we replace Y1 and Y2 in the original (structural) equations by their estimated value from

the preceding two regressions and then run the OLS regressions as follows:

)6....(....................

ˆ

)5......(....................

ˆ

2424323121202

1212111212101

ttttt

ttttt

uXXYY

uXXYY

Where

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

3

ttt

ttt uuu

uuu

12122

21211 ˆ

ˆ

And the proxy variables

Y

ˆ

which are close to the endogenous variables, the proxy variables are highly

correlated with the exogenous variables but uncorrelated with the error ones. (11)

The estimates thus obtained will be consistent.

4- Genetic algorithm (GA)

Genetic algorithms (Holland, 1975) perform a search for the solution to a problem by generating

candidate solutions from the space of all solutions and testing the performance of the candidates. The

search method is based on ideas from genetics and the size of the search space is determined by the

representation of the domain (4)

In a genetic algorithm, each individual of a population is one possible solution to an optimization

problem, encoded as a binary string called a

chromosome. A group of these individuals will be generated, and will compete for the right to

reproduce or even be carried over into the next generation of the population. Competition consists of

applying fitness

function to every individual in the population; the individuals with the best result are the fittest. The

next generation will then be constructed by carrying over a few of the best individuals, reproduction,

and mutation.

Reproduction is carried out by a “crossover” operation, similar to what happens in an animal embryo.

Two chromosomes exchange portions of their code, thus forming a pair of new individuals. In the

simplest form of

crossover, a crossover point on the two chromosomes is selected at random, and the chromosomes

exchange all data after that point, while keeping their own data up to that point. In order to introduce

additional variation in the population, a mutation operator will randomly change a

bit or bits in some chromosome(s). Usually, the mutation rate is kept low to permit good solutions to

remain stable. The two most critical elements of a genetic algorithm are the way solutions are

represented, and the fitness function, both of which are problem-dependent. The coding for a solution

must be designed to represent a possibly complicated idea or sequence of steps (18)

The basic genetic algorithm (GAs) is outlined as below

Step I [Start] Generate random population of chromosomes, that is, suitable solutions for the problem.

Step II [Fitness] Evaluate the fitness of each chromosome in the population.

Step III [New population] Create a new population by repeating following steps until the new

population is complete.

a) [Selection] Select two parent chromosomes from a population according to their fitness. Better the

fitness, the bigger chance to be selected to be the parent.

b) [Crossover] With a crossover probability, cross over the parents to form new offspring, that is,

children. If no crossover was performed, offspring is the exact copy of parents.

c) [Mutation] With a mutation probability, mutate new offspring at each locus.

d) [Accepting] Place new offspring in the new population.

Step IV [Replace] Use new generated population for a further run of the algorithm.

Step V [Test] If the end condition is satisfied, stop, and return the best solution in current population.

Step VI [Loop] Go to step 2.

The genetic algorithms performance is largely influenced by crossover and mutation operators.

The block diagram representation of genetic algorithms (GAs) is shown in Fig.1. (15)

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

4

Figure 1 shows block schematic of various stages to perform

genetic algorithms (GAs) optimization

5- Genetic Algorithm for Regressors’ Selection (GARS)

GA starts with a set of solutions taken from a population which is constituted by chromosomes. These

solutions are then used to create a new population However, GA has an edge on traditional algorithms

because of its advantages such as not needing a derivative and other supporting information, and being

able to find global optimum points without being stuck with local optimum points. In GA, the search

is carried out on a potential solution set and the solutions are evaluated until

the best solution is found.(6)

Uses binary encoding to identify which independent variables should be included in the model. No

transformation is applied to the independent variables before including them. Each GA individual

consists of a string of m binary cells: if the i-th cell (i=1,...,m) has value 1, then Xi is

included in the model, otherwise not. Every candidate solution is then evaluated with respect to a

fitness function. The AIC criteria have been considered as possible fitness function. After randomly

initializing the population and evaluating the population with respect to the chosen fitness function,

the population is evolved through generations using stochastic uniform sampling selection scheme,

single point crossover with pc=0.8, uniform mutation with pm=1/NBITS and direct reinsertion of the

best recorded candidate solution. The algorithm stops when the population has been evolved for

MAXGEN generations. The best solution is then reported. Even for a bigger search space, GARS is

still capable of selecting the models with smaller AIC value than the ones selected by the other and the

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

5

complete model. In case that the expert is interested in a model with good forecasting capabilities, the

model selected by GARS for AIC should be considered first. (16)

5- The proposed method (2SLS-GA)

Model selection and validation has a crucial role in statistics. The selection of a statistical model

usually requires a detailed a-priori analysis of the empirical framework and competence on the behalf

of the researcher. At first, the researcher should specify the functional form (linear or not), the number

and which variables to include in the model and the statistical distribution of the stochastic component

However, classical approaches have some shortcomings, such as the strong path-dependence and their

difficulty to explore the whole models space (20).In this paper we propose some evolutionary

approaches, based on genetic algorithms, in order to overcome these shortcomings. Genetic algorithms

allow a better exploration of the whole solution space through the evolution of a population of

candidate models to the problem under investigation. The method for regression modeling based on

improved genetic algorithm is proposed in first Stage of 2SLS.

We choose one representatives is Akaike information criterion(AIC)

For the regression model (12)

pSnLogAIC p2)( 2

Where n is the sample size;

P is the number of independent variables in the regression equation;

2

p

S

is the residual variance

Thus the fitness function for model

selection problem is assumed as the reciprocal of the rule

function. (17)

Now, consider the case where k is large. In such a case, it is

often desirable and necessary to select a subset of K = {1,

2 ,..., k}. Let P be any subset of K having |P| = p members.

Let XPa be the sub matrix of X containing only those columns

Whose indices are in P. Using OLS method, it is then possible

to estimate a new coefficient vector, bPa, with the same goal

of estimating dependent variables. The question then becomes how to select P so that the resulting

model is in some way good or desirable.(1)

The contribution of our paper to model building is a powerful procedure of selecting regressors which

permits a very good model selection performance using a simple information criterion. In building a

multiple regression model, a crucial problem is the selection of regressors to be included. If a lower

amount of regressors are selected in the model, the estimate of the parameters will not be consistent

and if a higher amount is

selected, its variance will increase. (5)

In our work Stage (1) of 2SLS method we regress the endogenous variables on all the

predetermined variables in the system.

The GA here used for selecting predetermined variables at random and evaluation the response models

by the asymptotic Information Criterion (AIC) (Akaike, 1973) to generate initial population

(solutions) and select one of them at random from the best n to improve estimate the parameters of

linear SEM. We can applied the set of 4 independent variables (with 2 observations) above and check

the models to obtained the random solutions. For each random solution that passed the Criterion with

an acceptable value with respect to all random solutions for evaluation and using two robust different

criteria mean absolute percentage error (MAPE) and median absolute error (MEDAE) (3) to compare.

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

6

Each individual is evaluated with respect to an objective

function (the fitness function) that measures the optimality

of each model respect to the problem under investigation

The population is evolved, within an elitistic schema, by using the usual genetic operators (crossover,

mutation, reinsertion) until a stopping criterion is satisfied. (20)

5- Results

The proposed method can be implemented according to real data (the Annual Report of the

Council of Economic Advisers ,2007-2018) (22) on the income–money supply modified model with

sample size equal to 48 and obtain the results by using MATLAB2017b

5-1:- Results of the first equation

Table1: the results of 2SLS and the proposed method (2SLS-GA )

for equation(1) where max iteration =100

MEDAE

MAPE

Variables selected to

measure response

according to AIC

Estimations

Method

133.2374

1.7955

All

0014.0

0018.0 0004.0

3637.2

12

11

12

10

2SLS

102.1789

1.1871

3

1

X

X

0006.0

0011.0 0004.0

7293.2

12

11

12

10

The

proposed

method

2SLS-

GA

5-2:- Results of the second equation

Table2: the results of 2SLS and the proposed method (2SLS-GA )

for equation(2) where max iteration =100

MEDAE

MAPE

Variables selected to

measure response

according to AIC

Estimations

Method

51.8462

2.1587

All

0229.0

0614.0 0323.0

8828.76

24

23

21

20

2SLS

50.6710

2.0225

4

3

2

X

X

X

0112.1

5893.1 7472.0

4157.145

24

23

21

20

The

proposed

method

2SLS-GA

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

7

6- Conclusions

It is clear that the results of the proposed method (2SLS-GA) are better than the results of the

traditional method (2SLS) using the two criteria (MAPE and MEDAE) .This means that the genetic

algorithm (GA) with mutation rate equal to (0.0625) has succeeded to improving the estimators of

linear SEM.

References

[1 ] Bradley C. Wallet, David J. Marchette, Jeffery L. Solka, and Edward J. Wegman(1996) "A

Genetic Algorithm for Best Subset Selection in Linear Regression "To appear in the

Proceedings of the 28th Symposium on the Interface .

[2 ] Damodar N. Gujarati and Dawn C. Porter (2008) " Basic Econometrics" Fifth Edition,

www.mhhe.com .

[3 ] David A. Swanson, Jeff Tayman and T. M. Bryan (2010) " MAPE-R: A RESCALED

MEASURE OF ACCURACY FOR CROSS-SECTIONAL, SUBNATIONAL FORECASTS "

Riverside, CA 92521 USA ,email: David.swanson@ucr.edu .

[4 ] D. Michie, D.J. Spiegelhalter and C.C. Taylor(1994)" Machine Learning, Neural and

Statistical Classification"MRC Biostatistics Unit, Institute of Public Health, University Forvie

Site,Robinson Way, Cambridge CB2 2SR, U.K.

[5 ] Eduardo Acosta-González and Fernando Fernández-Rodríguez (2001) "MODEL

SELECTION VIA GENETIC ALGORITHMS"JEL classification: C20; C61; C63.

[6 ] Emre Demir and Özge Akkuş (2015)" An Introductory Study on “How the Genetic Algorithm

Works in the Parameter Estimation of Binary Logit Model?" International Journal of

Sciences: Basic and Applied Research (IJSBAR) ,Volume 19, No 2, pp 162-180 .

[7 ] Jeffrey M.wooldredge "Econometric Analysis of Cross Section and Panel Data"The MIT

Press, Cambridge, Massachusetts, London, England .

[8 ] Jeffrey M.wooldredge "introductory econometrics a modern approach" The MIT Press,

Cambridge, Massachusetts ,London, England .

[9 ] Jinyong Hahn and Jerry Hausman (2002) " Notes on bias in estimators for simultaneous

equation models" Economics Letters 75 237–241.

[10 ] Jose J. López-Espín ,Antonio M. Vidal b and Domingo Giménezc (2012) " Two-stage least

squares and indirect least squares algorithms for simultaneous equations models" Journal of

Computational and Applied Mathematics 236 3676–3684 .

[11 ] Jose J. L´opez-Esp´ına and Domingo Gim´enezb (2012) " Obtaining simultaneous equation

models from a set of variables through genetic algorithms" Procedia Computer Science 1 427–

435 .

[12 ] KENNETH P. BURNHAM and DAVID R. ANDERSON (2004) " Understanding AIC and

BIC in Model Selection" SOCIOLOGICAL METHODS & RESEARCH, Vol. 33, No. 2.

[13 ] M A IQUEBAL, PRAJNESHUand HIMADRI GHOSH (2012) "Genetic algorithm

optimization technique for linear regression models with heteroscedastic errors",Indian

Journal of Agricultural Sciences 82 (5): 422–5 .

[14 ] Mr. Manish Saraswat and Mr. Ajay Kumar Sharma (2013) "Genetic Algorithm for

optimization using MATLAB", Available Online atwww.ijarcs.info .

[15 ] Rahul Malhotra, Narinder Singh & Yaduvir Singh(2011) "Genetic Algorithms: Concepts,

Design for Optimization of Process Controllers" Published by Canadian Center of Science

and Education,www.ccsenet.org/cis .

[16 ] SANDRA PATERLINI and TOMMASO MINERVA (2007) "Regression Model Selection

Using Genetic Algorithms" ,Rome PRIN,ISSN: 1790-5109 .

2nd International Science Conference

IOP Conf. Series: Journal of Physics: Conf. Series 1294 (2019) 032023

IOP Publishing

doi:10.1088/1742-6596/1294/3/032023

8

[17 ] Shi Minghua, Xiao Qingxian, Zhou Benda and Yang Feng(2017) " REGRESSION

MODELLING BASED ON IMPROVED GENETIC ALGORITHM" ISSN 1330-3651

(Print), ISSN 1848-6339 (Online) ,DOI: 10.17559/TV-20160525104127 .

[18 ] Sultan H. Aljahdali and Mohammed E. El-Telbany(2008) "Genetic Algorithms for

Optimizing Ensemble of Models in Software Reliability Prediction"ICGST-AIML Journal,

Volume 8, Issue I .

[19 ] Steven C. Chapra (2012) "Applied Numerical Methods with MATLAB® for Engineers and

Scientists" Third Edition, Berger Chair in Computing and Engineering Tufts University .

[20 ] Tommaso Minerva and Sandra Paterlini (2002) "Evolutionary approaches for statistical

modeling"0-7803-7282-4/02/$10.00 ©2002 IEEE,

https://www.researchgate.net/publication/232620105 .

[21 ] William H. Greene (2003) " ECONOMETRIC ANALYSIS " FIFTH EDITION , Upper

Saddle River, New Jersey 07458 .

[22 ] The Annual Report of the Council of Economic Advisers (2007-2018) "Economic Report of

the President" https://www.whitehouse.gov/wp-content / .