ArticlePDF Available

Toward hybrid platform for evolutionary computations of hard discrete problems

Authors:

Abstract and Figures

Memetic agent-based paradigm, which combines evolutionary computation and local search techniques in one of promising meta-heuristics for solving large and hard discrete problem such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary systems platform, which spreads computations among CPU and GPU, is shortly introduced. The main part of the paper presents an efficient parallel GPU implementation of LABS local optimization strategy. As a means for comparison, speed-up between GPU implementation and CPU sequential and parallel versions are shown. This constitutes a promising step toward building hybrid platform that combines evolutionary meta-heuristics with highly efficient local optimization of chosen discrete problems.
Content may be subject to copyright.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 108C (2017) 877–886
1877-0509 © 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Science
10.1016/j.procs.2017.05.201
International Conference on Computational Science, ICCS 2017, 12-14 June 2017,
Zurich, Switzerland
10.1016/j.procs.2017.05.201 1877-0509
© 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientic committee of the International Conference on Computational Science
This space is reserved for the Procedia header, do not use it
Toward hybrid platform for evolutionary computations
of hard discrete problems
Dominik Żurek1, Kamil Piętak1, Marcin Pietroń1, and Marek Kisiel-Dorohinicki1
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
{dzurek,kpietak,pietron,doroh}@agh.edu.pl
Abstract
Memetic agent-based paradigm, which combines evolutionary computation and local search
techniques in one of promising meta-heuristics for solving large and hard discrete problem
such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the
paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary
systems platform, which spreads computations among CPU and GPU, is shortly introduced.
The main part of the paper presents an efficient parallel GPU implementation of LABS local
optimization strategy. As a means for comparison, speed-up between GPU implementation
and CPU sequential and parallel versions are shown. This constitutes a promising step toward
building hybrid platform that combines evolutionary meta-heuristics with highly efficient local
optimization of chosen discrete problems.
Keywords: evolutionary computing, GPU computing, memetic computing, LABS
1 Introduction
In the paper, as a follow-up of the research presented in [13], the next step in solving Low Auto-
corellation Binary Sequence with efficient techniques is shown. LABS, one of the hard discrete
problem despite wide research, is still an open optimization problem for long sequences. The pa-
per introduces a very efficient, parallel realization of local optimization for LABS implemented
at GPU. The implementation is thought as a part of a hybrid computational environment
that utilize agent-based evolutionary meta-heuristics. The integration of the environment and
proposed components will be a topic for further research.
There is a lot of various methods that tries to solve LABS problem. The simplest one
is exhaustive enumeration (ie. brute-force method) that provides the best results, but can be
applied only to small values of L. Some researchers use a partial enumeration, choosing so
called skew symmetric sequences[14] that are the most likely solutions for many lengths (eg.
for L[31,65], 21 best sequences are skew symmetric).
However, enumerative algorithms (complete or partial) are limited to small values of Lby
the exponential size of the search space. Heuristic algorithms use some plausible rules to locate
1
This space is reserved for the Procedia header, do not use it
Toward hybrid platform for evolutionary computations
of hard discrete problems
Dominik Żurek1, Kamil Piętak1, Marcin Pietroń1, and Marek Kisiel-Dorohinicki1
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
{dzurek,kpietak,pietron,doroh}@agh.edu.pl
Abstract
Memetic agent-based paradigm, which combines evolutionary computation and local search
techniques in one of promising meta-heuristics for solving large and hard discrete problem
such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the
paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary
systems platform, which spreads computations among CPU and GPU, is shortly introduced.
The main part of the paper presents an efficient parallel GPU implementation of LABS local
optimization strategy. As a means for comparison, speed-up between GPU implementation
and CPU sequential and parallel versions are shown. This constitutes a promising step toward
building hybrid platform that combines evolutionary meta-heuristics with highly efficient local
optimization of chosen discrete problems.
Keywords: evolutionary computing, GPU computing, memetic computing, LABS
1 Introduction
In the paper, as a follow-up of the research presented in [13], the next step in solving Low Auto-
corellation Binary Sequence with efficient techniques is shown. LABS, one of the hard discrete
problem despite wide research, is still an open optimization problem for long sequences. The pa-
per introduces a very efficient, parallel realization of local optimization for LABS implemented
at GPU. The implementation is thought as a part of a hybrid computational environment
that utilize agent-based evolutionary meta-heuristics. The integration of the environment and
proposed components will be a topic for further research.
There is a lot of various methods that tries to solve LABS problem. The simplest one
is exhaustive enumeration (ie. brute-force method) that provides the best results, but can be
applied only to small values of L. Some researchers use a partial enumeration, choosing so
called skew symmetric sequences[14] that are the most likely solutions for many lengths (eg.
for L[31,65], 21 best sequences are skew symmetric).
However, enumerative algorithms (complete or partial) are limited to small values of Lby
the exponential size of the search space. Heuristic algorithms use some plausible rules to locate
1
This space is reserved for the Procedia header, do not use it
Toward hybrid platform for evolutionary computations
of hard discrete problems
Dominik Żurek1, Kamil Piętak1, Marcin Pietroń1, and Marek Kisiel-Dorohinicki1
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
{dzurek,kpietak,pietron,doroh}@agh.edu.pl
Abstract
Memetic agent-based paradigm, which combines evolutionary computation and local search
techniques in one of promising meta-heuristics for solving large and hard discrete problem
such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the
paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary
systems platform, which spreads computations among CPU and GPU, is shortly introduced.
The main part of the paper presents an efficient parallel GPU implementation of LABS local
optimization strategy. As a means for comparison, speed-up between GPU implementation
and CPU sequential and parallel versions are shown. This constitutes a promising step toward
building hybrid platform that combines evolutionary meta-heuristics with highly efficient local
optimization of chosen discrete problems.
Keywords: evolutionary computing, GPU computing, memetic computing, LABS
1 Introduction
In the paper, as a follow-up of the research presented in [13], the next step in solving Low Auto-
corellation Binary Sequence with efficient techniques is shown. LABS, one of the hard discrete
problem despite wide research, is still an open optimization problem for long sequences. The pa-
per introduces a very efficient, parallel realization of local optimization for LABS implemented
at GPU. The implementation is thought as a part of a hybrid computational environment
that utilize agent-based evolutionary meta-heuristics. The integration of the environment and
proposed components will be a topic for further research.
There is a lot of various methods that tries to solve LABS problem. The simplest one
is exhaustive enumeration (ie. brute-force method) that provides the best results, but can be
applied only to small values of L. Some researchers use a partial enumeration, choosing so
called skew symmetric sequences[14] that are the most likely solutions for many lengths (eg.
for L[31,65], 21 best sequences are skew symmetric).
However, enumerative algorithms (complete or partial) are limited to small values of Lby
the exponential size of the search space. Heuristic algorithms use some plausible rules to locate
1
878 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
good sequences more quickly. Examples include simulated annealing, evolutionary algorithms,
local optimization techniques – the list of heuristic algorithms can be found in [7]. A well-
known method of local optimization for LABS is Tabu Search [9, 8] – the best results have been
obtained using some variations of this technique [11]. All these methods give relatively good
solutions for L<200, but fail for larger lengths. As a result of this fact LABS problem found
its place in CSPLIB list, a library of test problems for constraint solvers [18]. The best known
results (BKVs) for LABS can be found in [3].
The paper concentrates on solving LABS problem using efficient parallel computations on
GPU which will be further combined with evolutionary meta-heuristics. The authors introduce
the basic concept of hybrid computational environment that combines CPU and GPU compu-
tations and widely presents the way of solving LABS problem on GPU using one of local search
strategies called Steepest Descend Local Search (SDLS) [2].
The paper is organized as follows: introduction to LABS problem together with optimized
algorithm for energy computation are presented to show a background for GPU algorithm.
Then short introduction to the concept of evolutionary multi-agent systems (EMAS) with local
optimization techniques has been provided, followed by an overview of a platform allowing
to run such systems in a hybrid environment comprised of CPU and GPU. The next section
presents details about LABS evaluation and local optimization implemented on GPU, which
constitutes the central point of the paper. Than, the sample results are presented to compare
efficiency between pure CPU and GPU computations and to compute speed-up we gain using
GPU. At last, after the conclusions, the future work such as integration of CPU and GPU in
the shape of hybrid computational platform is presented.
2 LABS problem
Low Autocorrelation Binary Sequence (LABS) is an NP-hard combinatorial problem with simple
formulation. It has been under intensive study since 1960s by physics and artificial intelligence
communities. It consists in finding a binary sequence S={s0,s
1, ..., sL1}with length L,
where si∈ {−1,1}which minimizes energy function E(S):
Ck(S)=
Lk1
i=0
sisi+k
E(S)=
L1
k=1
C2
k(S)
(1)
M.J Golay defined also a so called merit factor [10], which binds LABS energy level to the
length of a given sequence:
F(S)= L2
2E(S)(2)
The search space for the problem with length Lhas size 2Land energy of sequence can be
computed in time O(L2). LABS problem has no constraints, so Scan be represented naturally
as an array of binary values. One of the reason of high complexity of the problem is, that in
LABS all sequence elements are correlated. One change that improves some Ci(S), has also an
impact on many other Cj(S)and can lead to big changes of solution’s energy.
2
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
s1s2s2s3s3s4s4s5
s1s3s2s4s3s5
s1s4s2s5
s1s5
T(S)
s1s2+s2s3+s3s4+s4s5
s1s3+s2s4+s3s5
s1s4+s2s5
s1s5
C(S)
Figure 1: Data structures introduced to efficiently compute fitness in the neighborhood.
Computing the energy can be speed up with technique proposed by Cotta et al.[9], who
introduced the notion of neighborhood of a solution Swith length Lobtained by flipping
exactly one symbol in the sequence:
N(S)={flip(S, i)|i∈{1, .., L}} (3)
where flip(s1...si...sL,i)=s1.... si.....sL[9].
Than, all computed products can be stored in a (L1) ×(L1) table T(S), such that
T(S)ij =sjsi+jfor jLi, and saving the values of the different correlations in a L1
dimensional vector C(S), defined as C(S)k=Ck(S)for 1kL1. Figure 1 shows these
data structures for a L = 5 instance. Cotta observed that flipping a single symbol simultiples
by 1the value of cells in T(S)where siis involved, the fitness of sequence flip(S, i)can be
efficiently recomputed in time O(L)using the algorithm 1.
Algorithm 1 Computing LABS energy using T(S)and C(S)structures
1: function ValueFlip(S,i,T,C)
2: f:= 0
3: for p:= 0 to L1do
4: v:= Cp
5: if pLithen v:= 2Tpi end if
6: if p<ithen v:= 2Tp(ip)end if
7: f:= f+v2
end for
8: return f
end function
The flipping method can be successfully used in various local optimization techniques where
usually one interactions modifies only one sequence element. Such technique can be used for
example in Steepest Descent Local Search method, which modifies one form the sequence ele-
ments and than checks if a new sequence has better energy. If so, the next random element is
modified. The process is continued until the no new better sequence is created. The number
of iterations can be also set to chosen constant – this technique can be very useful when one
optimizes parallely many sequences, because the time of optimization for each solution is more
or less the same, ie. there is almost no time loss dedicated for synchronization. The SDLS
algorithm that uses flip operations to compute LABS energy has been proposed also in [9].
3
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886 879
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
good sequences more quickly. Examples include simulated annealing, evolutionary algorithms,
local optimization techniques – the list of heuristic algorithms can be found in [7]. A well-
known method of local optimization for LABS is Tabu Search [9, 8] – the best results have been
obtained using some variations of this technique [11]. All these methods give relatively good
solutions for L<200, but fail for larger lengths. As a result of this fact LABS problem found
its place in CSPLIB list, a library of test problems for constraint solvers [18]. The best known
results (BKVs) for LABS can be found in [3].
The paper concentrates on solving LABS problem using efficient parallel computations on
GPU which will be further combined with evolutionary meta-heuristics. The authors introduce
the basic concept of hybrid computational environment that combines CPU and GPU compu-
tations and widely presents the way of solving LABS problem on GPU using one of local search
strategies called Steepest Descend Local Search (SDLS) [2].
The paper is organized as follows: introduction to LABS problem together with optimized
algorithm for energy computation are presented to show a background for GPU algorithm.
Then short introduction to the concept of evolutionary multi-agent systems (EMAS) with local
optimization techniques has been provided, followed by an overview of a platform allowing
to run such systems in a hybrid environment comprised of CPU and GPU. The next section
presents details about LABS evaluation and local optimization implemented on GPU, which
constitutes the central point of the paper. Than, the sample results are presented to compare
efficiency between pure CPU and GPU computations and to compute speed-up we gain using
GPU. At last, after the conclusions, the future work such as integration of CPU and GPU in
the shape of hybrid computational platform is presented.
2 LABS problem
Low Autocorrelation Binary Sequence (LABS) is an NP-hard combinatorial problem with simple
formulation. It has been under intensive study since 1960s by physics and artificial intelligence
communities. It consists in finding a binary sequence S={s0,s
1, ..., sL1}with length L,
where si∈ {−1,1}which minimizes energy function E(S):
Ck(S)=
Lk1
i=0
sisi+k
E(S)=
L1
k=1
C2
k(S)
(1)
M.J Golay defined also a so called merit factor [10], which binds LABS energy level to the
length of a given sequence:
F(S)= L2
2E(S)(2)
The search space for the problem with length Lhas size 2Land energy of sequence can be
computed in time O(L2). LABS problem has no constraints, so Scan be represented naturally
as an array of binary values. One of the reason of high complexity of the problem is, that in
LABS all sequence elements are correlated. One change that improves some Ci(S), has also an
impact on many other Cj(S)and can lead to big changes of solution’s energy.
2
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
s1s2s2s3s3s4s4s5
s1s3s2s4s3s5
s1s4s2s5
s1s5
T(S)
s1s2+s2s3+s3s4+s4s5
s1s3+s2s4+s3s5
s1s4+s2s5
s1s5
C(S)
Figure 1: Data structures introduced to efficiently compute fitness in the neighborhood.
Computing the energy can be speed up with technique proposed by Cotta et al.[9], who
introduced the notion of neighborhood of a solution Swith length Lobtained by flipping
exactly one symbol in the sequence:
N(S)={flip(S, i)|i∈{1, .., L}} (3)
where flip(s1...si...sL,i)=s1.... si.....sL[9].
Than, all computed products can be stored in a (L1) ×(L1) table T(S), such that
T(S)ij =sjsi+jfor jLi, and saving the values of the different correlations in a L1
dimensional vector C(S), defined as C(S)k=Ck(S)for 1kL1. Figure 1 shows these
data structures for a L = 5 instance. Cotta observed that flipping a single symbol simultiples
by 1the value of cells in T(S)where siis involved, the fitness of sequence flip(S, i)can be
efficiently recomputed in time O(L)using the algorithm 1.
Algorithm 1 Computing LABS energy using T(S)and C(S)structures
1: function ValueFlip(S,i,T,C)
2: f:= 0
3: for p:= 0 to L1do
4: v:= Cp
5: if pLithen v:= 2Tpi end if
6: if p<ithen v:= 2Tp(ip)end if
7: f:= f+v2
end for
8: return f
end function
The flipping method can be successfully used in various local optimization techniques where
usually one interactions modifies only one sequence element. Such technique can be used for
example in Steepest Descent Local Search method, which modifies one form the sequence ele-
ments and than checks if a new sequence has better energy. If so, the next random element is
modified. The process is continued until the no new better sequence is created. The number
of iterations can be also set to chosen constant – this technique can be very useful when one
optimizes parallely many sequences, because the time of optimization for each solution is more
or less the same, ie. there is almost no time loss dedicated for synchronization. The SDLS
algorithm that uses flip operations to compute LABS energy has been proposed also in [9].
3
880 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
3 Evolutionary meta-heuristics with local optimization for
LABS
One of the promising metaheuristic in solving hard problems is evolutionary multi-agent system
proposed by K. Cetnarowicz [6] and developed for many years of research by IISG group at
AGH-UST in Krakow. The EMAS extended with local optimization techniques, so called
memetic EMAS, has been already applied to hard discrete problems such as LABS [13, 12].
3.1 The concept of the memetic EMAS
In a basic EMAS, the system is comprised of individual agents living in a population1. The
main concepts of evolution are realized in the shape of reproduction and death operations. In
optimization problems the individuals contain a solution decoded in the shape of genotype. The
genotype is inherited from parents using variation operators (ie. mutation and recombination).
As there is no global knowledge, in contrast to classic evolutionary algorithm, the selection
is based on energy resource belonging to each individual. The level of energy is related to its
quality and causes various behavior of an individual: high level of energy allows it to reproduce,
average level leads to energy transfer between individuals (“fight” operation) and low level causes
agent’s death.
Each new born individual has to be evaluated by a fitness function which rates it’s genotype
in context of a given problem. In memetic variant of EMAS, the evaluation can be further
improved utilizing various local optimization techniques such as Tabu Search, Steepest Descent
Local Search or Random Mutation Hill Climbing.
Algorithm 2 A simplified algorithm of population processing in memetic EMAS
1: function processPopulation(population)
2: pairs := selectP airs(population)
3: newBorn := Array[]
4: for pair in pairs do
5: if canReproduce(pair)then newBorn+=reproduce(pair)
6: elsefight(pair)end if
end for
7: newBorn := evaluate(newBorn)
8: newBorn := improve(newBorn)
9: population+=newBorn
10: for ind in population do
11: if shouldDie(ind)then remove(population, ind) end if
end for
12: return population
end function
The concept of memetic EMAS can be also illustrated using a simplified, sequential al-
gorithm shown in 2. The processPopultation function contains a single “step” that is called
sequentially until a stop condition is reached. Within each step, pairs for reproduction or fight
are selected from the current population. If a selected pair is “good enough”, then a new indi-
vidual is “born” from the selected parents. Otherwise, individuals from the pair fight with each
other, and in consequence a portion of life energy is transferred between them. Next, all new
1There is also multi-populations variant of EMAS, but it is out of the scope of this paper.
4
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
born individuals are then evaluated and improved using local optimization techniques. At last,
all week individuals are “killed”, ie. removed from the population.
In [13, 12] we shown results coming from many experiments performed using pure CPU
architecture with one or many threads. The results show that memetic EMAS is a promising
technique for solving not only relatively small LABS problems (L50), but also sizes that still
are a challenge (eg. L200).
3.2 The concept of a hybrid platform
Evolutionary multi-agent systems, as well as classic evolutionary algorithms, can be easily
parallelised using various techniques [5, 1]. Two different research directions in this field can be
observed:
decomposition models that assume geographic division of population and constraints for
interaction between particular individuals; the whole process of evolution is then paral-
lelised in particular sub-populations; the most popular decomposition concepts are mi-
gration and cell models;
global parallelisation in which particular steps of evolution performed on different indi-
viduals are implemented on many computational nodes.
When computations with GPUs are considered, the global parallelisation model can be
applied by using master-slave architecture. The architecture assumes that central unit performs
most of the evolutionary process (ie. reproduction, energy transfer or death operations) and
the unit delegates some of most expensive computations to the GPU. In case of memetic EMAS
(see algorithm 2) the operations of evaluation and improvement are the good choice to delegate
to slaves. These two operations are combined and extended with required conversions between
CPU and GPU representations of a given problem. To minimize the communication overhead
and use the parallel nature of GPU, the evaluation and improvement operation is done for whole
population at once. The GPU interface gets the variable number of solutions that belong to
newly created individuals and than performs evaluation and improvement. In effect, the GPU
returns a vector of evaluations together with improved solutions to update adequate individuals.
The described concept of the hybrid computational environment is partially implemented on
AgE platform2– a Java-based solution developed as an open-source project by the Intelligent
Information Systems Group of AGH-UST. AgE provides a platform for the development and
execution of distributed agent-based applications in mainly simulation and computational tasks
[4, 15]. The modular architecture of AgE allows to use components to assembly and run agent-
based computations for various problems such as black-box hard discrete problems (eg. LABS,
OGR, Job-shop)[12, 13] or continues optimization problems. The platform is prepared to be
integrated with external solvers such as presented LABS optimization on GPU.
4 Parallel SDLS algorithm for LABS local optimization
To better understand details about evaluation and improvement of LABS solutions, Nlet us
introduce a basis of GPU processing. The GPU processors are constructed as Nmultiproces-
sor structure with Mcores each. The cores share an Instruction Unit with other cores in a
multiprocessor. The multiprocessors contain dedicated memories which are shared among all
cores and are much faster than a global memory. There are also efficient read-only constant
2https://gitlab.com/age-agh/age3
5
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886 881
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
3 Evolutionary meta-heuristics with local optimization for
LABS
One of the promising metaheuristic in solving hard problems is evolutionary multi-agent system
proposed by K. Cetnarowicz [6] and developed for many years of research by IISG group at
AGH-UST in Krakow. The EMAS extended with local optimization techniques, so called
memetic EMAS, has been already applied to hard discrete problems such as LABS [13, 12].
3.1 The concept of the memetic EMAS
In a basic EMAS, the system is comprised of individual agents living in a population1. The
main concepts of evolution are realized in the shape of reproduction and death operations. In
optimization problems the individuals contain a solution decoded in the shape of genotype. The
genotype is inherited from parents using variation operators (ie. mutation and recombination).
As there is no global knowledge, in contrast to classic evolutionary algorithm, the selection
is based on energy resource belonging to each individual. The level of energy is related to its
quality and causes various behavior of an individual: high level of energy allows it to reproduce,
average level leads to energy transfer between individuals (“fight” operation) and low level causes
agent’s death.
Each new born individual has to be evaluated by a fitness function which rates it’s genotype
in context of a given problem. In memetic variant of EMAS, the evaluation can be further
improved utilizing various local optimization techniques such as Tabu Search, Steepest Descent
Local Search or Random Mutation Hill Climbing.
Algorithm 2 A simplified algorithm of population processing in memetic EMAS
1: function processPopulation(population)
2: pairs := selectP airs(population)
3: newBorn := Array[]
4: for pair in pairs do
5: if canReproduce(pair)then newBorn+=reproduce(pair)
6: elsefight(pair)end if
end for
7: newBorn := evaluate(newBorn)
8: newBorn := improve(newBorn)
9: population+=newBorn
10: for ind in population do
11: if shouldDie(ind)then remove(population, ind) end if
end for
12: return population
end function
The concept of memetic EMAS can be also illustrated using a simplified, sequential al-
gorithm shown in 2. The processPopultation function contains a single “step” that is called
sequentially until a stop condition is reached. Within each step, pairs for reproduction or fight
are selected from the current population. If a selected pair is “good enough”, then a new indi-
vidual is “born” from the selected parents. Otherwise, individuals from the pair fight with each
other, and in consequence a portion of life energy is transferred between them. Next, all new
1There is also multi-populations variant of EMAS, but it is out of the scope of this paper.
4
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
born individuals are then evaluated and improved using local optimization techniques. At last,
all week individuals are “killed”, ie. removed from the population.
In [13, 12] we shown results coming from many experiments performed using pure CPU
architecture with one or many threads. The results show that memetic EMAS is a promising
technique for solving not only relatively small LABS problems (L50), but also sizes that still
are a challenge (eg. L200).
3.2 The concept of a hybrid platform
Evolutionary multi-agent systems, as well as classic evolutionary algorithms, can be easily
parallelised using various techniques [5, 1]. Two different research directions in this field can be
observed:
decomposition models that assume geographic division of population and constraints for
interaction between particular individuals; the whole process of evolution is then paral-
lelised in particular sub-populations; the most popular decomposition concepts are mi-
gration and cell models;
global parallelisation in which particular steps of evolution performed on different indi-
viduals are implemented on many computational nodes.
When computations with GPUs are considered, the global parallelisation model can be
applied by using master-slave architecture. The architecture assumes that central unit performs
most of the evolutionary process (ie. reproduction, energy transfer or death operations) and
the unit delegates some of most expensive computations to the GPU. In case of memetic EMAS
(see algorithm 2) the operations of evaluation and improvement are the good choice to delegate
to slaves. These two operations are combined and extended with required conversions between
CPU and GPU representations of a given problem. To minimize the communication overhead
and use the parallel nature of GPU, the evaluation and improvement operation is done for whole
population at once. The GPU interface gets the variable number of solutions that belong to
newly created individuals and than performs evaluation and improvement. In effect, the GPU
returns a vector of evaluations together with improved solutions to update adequate individuals.
The described concept of the hybrid computational environment is partially implemented on
AgE platform2– a Java-based solution developed as an open-source project by the Intelligent
Information Systems Group of AGH-UST. AgE provides a platform for the development and
execution of distributed agent-based applications in mainly simulation and computational tasks
[4, 15]. The modular architecture of AgE allows to use components to assembly and run agent-
based computations for various problems such as black-box hard discrete problems (eg. LABS,
OGR, Job-shop)[12, 13] or continues optimization problems. The platform is prepared to be
integrated with external solvers such as presented LABS optimization on GPU.
4 Parallel SDLS algorithm for LABS local optimization
To better understand details about evaluation and improvement of LABS solutions, Nlet us
introduce a basis of GPU processing. The GPU processors are constructed as Nmultiproces-
sor structure with Mcores each. The cores share an Instruction Unit with other cores in a
multiprocessor. The multiprocessors contain dedicated memories which are shared among all
cores and are much faster than a global memory. There are also efficient read-only constant
2https://gitlab.com/age-agh/age3
5
882 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
and texture memories with global access. Constant memory is efficient for reading constants
and texture memory for random read operations.
In contrast to CPU, GPGPU has less logic to control and branch prediction. GPGPU
enables to run thousands of parallel threads, which are grouped in blocks with shared memory.
The main aspects in development of highly efficient GPU programs are: the usage of mem-
ory, an efficient dividing code to parallel threads and communication between the threads.
Additionally synchronization and communication between the threads should be considered as
important issue.
In this paper the authors used graphics cards from Tesla family which are produced by
NVIDIA3. Those kind of GPGPU can be programmed by high-level programming framework
CUDA (Compute Unified Device Architecture). The CUDA is a set of software layers to
communicate with the GPU based on standard C/C++. The platform gives the direct access
to the GPU’s virtual instruction set and parallel computational elements (C or C++ code
can be directly send to the GPU without using assembly code). Creating the optimized code
requires knowledge of: how threads are mapped to blocks and grids, how to synchronize them
and distinguish access to various types of memory (registers, shared, global and read only
memories).
CUDA provides two kinds of APIs: The runtime API and the Driver API. In the first one,
kernels are defined and compiled in common with C/C++ files. The source code is compiling
using the NVidia Cuda Compiler (NVCC). As a result of compilation there is returned an
executable file containing the entire program. Using the NVCC is not possible to compile a
source code from another language like Java or Python. The Driver API is used when one
needs to invoke CUDA kernels from code of high-level programming language. As an example,
JCuda (Java for CUDA)4can be pointed.
4.1 Parallel SDLS realization for LABS on GPU
Based on analysis of efficient heuristics authors concluded that SDLS algorithm for LABS
problem is suitable to implement in GPU cards, because this approach can be fully paral-
lelized. Module responsible for optimizing LABS problem using SDLS method, receives binary
sequences of length Lrepresented as binary strings in {−1,1}L, which were randomly generated
for example by a curand library5Provided vectors are spread between blocks, so each one of
them contains their own sequence.
In the first step of the proposed algorithm, each block evaluates a given sequence SIusing
two data structures: T(S)and C(S)(see figure 1), which are stored in a shared memory. In
order to create these structures, L1iterations have to be performed. During the first iteration,
a distance (sisi+distance, where distance < L) has value 1 and there are used L1threads. In
the next iteration the distance is incremented and the number of threads, which are used to
compute next values of structures (next row in T(S)), is decremented. To create an element of
C(S), after each iteration sum of each Titeration(S)results computed using reduction operation.
To calculate first energy value, which will be used as a reference energy Er, result of reduction
is raised to the power of 2 and after each iteration this value is added to value from the previous
iteration.
The next step of the algorithm is a local optimization that utilize SDLS approach with
given number of iterations. The method explores a sequence neighborhood and therefor allows
3http://www.nvidia.com/object/cuda_home_new.html
4Project homepage: https://jcuda.org
5The curand library provides facilities that focus on the simple and efficient generation of high-quality
pseudo-random and quasi-random numbers.
6
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
to use fast flip strategy in sequence energy computations (see section 2). The concept is similar
to sequential version of SDLS [9], but each thread parallely mutates different bit of the input
sequence. In this process each block uses Lthreads, all of them:
1. change value of sequence element sindexOfThread to opposite one (1sindexOfThread ),
2. compute new energy value according to SDLS algorithm with value flip evaluation [9],
3. store new energy in shared memory and temporary C’(S) in thread’s cache memory (in
order to not calculate common C(S) again after updating it by thread which calculated
best energy)
According to V alueF l ip function (see algorithm 1), the final energy fis summed up from
vvalues that represent cells of new C(S)structure. Next, from the Lenergy values the best
one is chosen using reduction operation on a vector stored in shared memory. Additionally in
the same shared memory ids of corresponding threads are stored. In order to find best energy
and appropriate number of thread two parallel reduction process are run (figure 2). First, to
find minimum energy, the second to find index of thread, which computed this energy.
In the proposed algorithm, all threads create their own C(S)structure in their cache mem-
ory based on vvalues. Only one thread, which computed the best energy in current iteration
doing update of the common C(S)structure (C(S)is overwritten by C(S)of a winner thread).
.......
L
..........
.
.
.
log
2
L
min min min
min
min
min
min min
.......
L
..........
.
.
.
log
2
L
i
id id id id id
id
id
id
Figure 2: Reduction to find best energy and corresponding thread in a single block
Then minimum energy Emin is compared with reference energy Erwhich was computed
from the input sequence SI. If Emin <E
rthen Emin value becomes a new reference energy
and sequence for this energy is used as a new input sequence SI. The thread who computed
Emin energy copies C(S)table from his cache to shared memory. Additionally, the same thread
updates T(S)structure, changing cells, which contain elements with index equals to thread id.
7
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886 883
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
and texture memories with global access. Constant memory is efficient for reading constants
and texture memory for random read operations.
In contrast to CPU, GPGPU has less logic to control and branch prediction. GPGPU
enables to run thousands of parallel threads, which are grouped in blocks with shared memory.
The main aspects in development of highly efficient GPU programs are: the usage of mem-
ory, an efficient dividing code to parallel threads and communication between the threads.
Additionally synchronization and communication between the threads should be considered as
important issue.
In this paper the authors used graphics cards from Tesla family which are produced by
NVIDIA3. Those kind of GPGPU can be programmed by high-level programming framework
CUDA (Compute Unified Device Architecture). The CUDA is a set of software layers to
communicate with the GPU based on standard C/C++. The platform gives the direct access
to the GPU’s virtual instruction set and parallel computational elements (C or C++ code
can be directly send to the GPU without using assembly code). Creating the optimized code
requires knowledge of: how threads are mapped to blocks and grids, how to synchronize them
and distinguish access to various types of memory (registers, shared, global and read only
memories).
CUDA provides two kinds of APIs: The runtime API and the Driver API. In the first one,
kernels are defined and compiled in common with C/C++ files. The source code is compiling
using the NVidia Cuda Compiler (NVCC). As a result of compilation there is returned an
executable file containing the entire program. Using the NVCC is not possible to compile a
source code from another language like Java or Python. The Driver API is used when one
needs to invoke CUDA kernels from code of high-level programming language. As an example,
JCuda (Java for CUDA)4can be pointed.
4.1 Parallel SDLS realization for LABS on GPU
Based on analysis of efficient heuristics authors concluded that SDLS algorithm for LABS
problem is suitable to implement in GPU cards, because this approach can be fully paral-
lelized. Module responsible for optimizing LABS problem using SDLS method, receives binary
sequences of length Lrepresented as binary strings in {−1,1}L, which were randomly generated
for example by a curand library5Provided vectors are spread between blocks, so each one of
them contains their own sequence.
In the first step of the proposed algorithm, each block evaluates a given sequence SIusing
two data structures: T(S)and C(S)(see figure 1), which are stored in a shared memory. In
order to create these structures, L1iterations have to be performed. During the first iteration,
a distance (sisi+distance, where distance < L) has value 1 and there are used L1threads. In
the next iteration the distance is incremented and the number of threads, which are used to
compute next values of structures (next row in T(S)), is decremented. To create an element of
C(S), after each iteration sum of each Titeration(S)results computed using reduction operation.
To calculate first energy value, which will be used as a reference energy Er, result of reduction
is raised to the power of 2 and after each iteration this value is added to value from the previous
iteration.
The next step of the algorithm is a local optimization that utilize SDLS approach with
given number of iterations. The method explores a sequence neighborhood and therefor allows
3http://www.nvidia.com/object/cuda_home_new.html
4Project homepage: https://jcuda.org
5The curand library provides facilities that focus on the simple and efficient generation of high-quality
pseudo-random and quasi-random numbers.
6
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
to use fast flip strategy in sequence energy computations (see section 2). The concept is similar
to sequential version of SDLS [9], but each thread parallely mutates different bit of the input
sequence. In this process each block uses Lthreads, all of them:
1. change value of sequence element sindexOfThread to opposite one (1sindexOfThread ),
2. compute new energy value according to SDLS algorithm with value flip evaluation [9],
3. store new energy in shared memory and temporary C’(S) in thread’s cache memory (in
order to not calculate common C(S) again after updating it by thread which calculated
best energy)
According to V alueF l ip function (see algorithm 1), the final energy fis summed up from
vvalues that represent cells of new C(S)structure. Next, from the Lenergy values the best
one is chosen using reduction operation on a vector stored in shared memory. Additionally in
the same shared memory ids of corresponding threads are stored. In order to find best energy
and appropriate number of thread two parallel reduction process are run (figure 2). First, to
find minimum energy, the second to find index of thread, which computed this energy.
In the proposed algorithm, all threads create their own C(S)structure in their cache mem-
ory based on vvalues. Only one thread, which computed the best energy in current iteration
doing update of the common C(S)structure (C(S)is overwritten by C(S)of a winner thread).
.......
..........
.
.
.
2
L
min min min
min
min
min
min min
.......
..........
.
.
.
log
2
L
i
id id id id id
id
id
id
Figure 2: Reduction to find best energy and corresponding thread in a single block
Then minimum energy Emin is compared with reference energy Erwhich was computed
from the input sequence SI. If Emin <E
rthen Emin value becomes a new reference energy
and sequence for this energy is used as a new input sequence SI. The thread who computed
Emin energy copies C(S)table from his cache to shared memory. Additionally, the same thread
updates T(S)structure, changing cells, which contain elements with index equals to thread id.
7
884 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
Cotta proposed to run SDLS optimization until no further improvement is achieved [9],
however this concept in parallel version could cause synchronization overhead (when one block
waits for other that still seeks better sequence performing more iterations). This is why, the
constant value of iterations has been chosen to make all block execute in the same (or very
similar) time and then minimize waiting time before synchronization. After Niterations, the
computed achieved results (energies and sequences) are copied to global memory and then to
the CPU where they can be further processed by evolutionary meta-heuristics.
During this processes memory management is crucial. For the sequence of length L, the
following structures have to be allocated in the shared memory by each block:
Lsizeof (short)– sequence representation,
(L1)L
2sizeof (short)T(S)table
(L1) sizeof (int)C(S)table,
2Lsizeof (int)– energies after flipping with ids of corresponding threads.
5 Experimental Results
To show the effectiveness of the presented SDLS algorithm for LABS realized on GPU, both
GPU and CPU implementation have been considered. On CPU, sequential and parallel version
of algorithm (for 2 & 4 cores) have been run. The experiments for GPU were performed on
NVIDIA Tesla m2090 and for CPU on Intel(R) Xeon(R) CPU E5-2630 v3(2.4GHz).
The reference version of SDLS for LABS for CPU has been implemented using OpenMP6
library. It is a concurrency platform for multi-threaded, shared-memory and parallel processing
for C, C++ and Fortran languages. It’s a natural choice to build such algorithms, because
OpenMP allows to easily transform sequential code to a parallel one using appropriate compiler
directives to generate threads for the parallel processor platform (there is no need to create
threads nor assign tasks to each of them manually). In the experiments a special compiler
directive -O3 was used to optimize multi-core CPU implementation.
Both versions have been run for LABS sequences of lengths L= 48 and L= 201, comparing
times of execution for different number of solutions (1000, 4000, 8000, 16000, 32000, 64000).
Number of SDLS iterations has been set to 128. The results were gathered from 10 executions
of each configuration – the average values has been presented. The observed standard deviation
of the average results was negligible.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 22.61 698.8 30.90 348.2 15.40 239.2 10.58
4000 73.93 2663.8 36.03 1398.5 18.92 1120.9 15.16
8000 144.39 5341.6 36.99 2791.8 19.33 2281 15.80
16000 285.74 10683.6 37.39 5600.5 19.60 5030.9 17.61
32000 568.99 21380.4 37.58 11795.9 20.73 6605.2 11.61
64000 1129.87 42803.3 37.88 23574.8 20.87 13077.1 11.57
Table 1: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 48
6http://www.openmp.org/
8
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 916.35 11645.4 12.71 5756.9 6.28 3016.8 3.29
4000 3314.44 46524.3 14.04 23114.4 6.97 12059.2 3.64
8000 6869.51 93227.1 13.57 46574.2 6.78 24575.4 3.58
16000 12583.08 185832.4 14.77 93483.8 7.43 50674.3 4.03
32000 26573.16 376872.9 14.18 197617.3 7.44 112675.5 4.24
64000 53760.38 742100.9 13.80 396481.4 7.37 230078.9 4.28
Table 2: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 201
Results shown in the tables 1 and 2 present average times in milliseconds for particular
configurations together with speed-up value, computed as relation between time on CPU and
GPU version (Sp=TCPU
TGP U ). For L= 48 sequences (table 1), GPU algorithm is significantly
faster than sequential version on CPU (30 37) and still gives very good speed-up even for 4
cores on CPU (10 17) regardless of number of solutions. For L= 201 sequences (table 2),
GPU executes around 13 faster than sequential version and around 3.24.2faster than 4
cores CPU version. Some decrease of speed-up is caused by a fact, that the proposed algorithm
is optimized for L200 sequences and than it utilizes most of GPU logical resources. In such
cases, the physical GPU resources have to be shared between particular blocks and threads,
and so the efficiency is lower. In all cases parallel version of SDLS on GPU gives significant
efficiency improvement and shows that presented implementation can be used in fast finding
better LABS sequences and narrows domain for further process of LABS search using other
techniques such as evolutionary algorithms.
6 Conclusions and further work
This paper is one of the first steps toward efficient hybrid platform dedicated for solving difficult
discrete problems such as LABS or Golomb-ruler optimization, that utilize both evolutionary
multi-agent systems as well as local optimization techniques implemented on GPU. The pre-
sented implementation of LABS optimization shows that a significant speed-up can be achieved
using parallel GPU computations. The implementation can be further integrated with meta-
heuristics such as evolutionary algorithms, which constitute a basis for the concept of hybrid
computational environment in master-slave model that seems to be very promising.
In the near future the authors plan to fully integrate AgE platform with local optimiza-
tion algorithm implemented on GPU. Important direction of the research is to adjust both
algorithms to lower communication overhead between CPU and GPU which includes required
data conversions. Also, the plan assumes the implementation of the hybrid concept for other
difficult problems that can be solved using algorithms with parallel structure – some research
for optimal Golomb ruler search has been already published in [17, 16].
6.1 Acknowledgments
The research presented in the paper received support from AGH University of Science and Tech-
nology statutory project and by the Faculty of Computer Science, Electronics and Telecommu-
nications Dean’s Grant for Ph.D. Students and Young Researchers.
9
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886 885
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
Cotta proposed to run SDLS optimization until no further improvement is achieved [9],
however this concept in parallel version could cause synchronization overhead (when one block
waits for other that still seeks better sequence performing more iterations). This is why, the
constant value of iterations has been chosen to make all block execute in the same (or very
similar) time and then minimize waiting time before synchronization. After Niterations, the
computed achieved results (energies and sequences) are copied to global memory and then to
the CPU where they can be further processed by evolutionary meta-heuristics.
During this processes memory management is crucial. For the sequence of length L, the
following structures have to be allocated in the shared memory by each block:
Lsizeof (short)– sequence representation,
(L1)L
2sizeof (short)T(S)table
(L1) sizeof (int)C(S)table,
2Lsizeof (int)– energies after flipping with ids of corresponding threads.
5 Experimental Results
To show the effectiveness of the presented SDLS algorithm for LABS realized on GPU, both
GPU and CPU implementation have been considered. On CPU, sequential and parallel version
of algorithm (for 2 & 4 cores) have been run. The experiments for GPU were performed on
NVIDIA Tesla m2090 and for CPU on Intel(R) Xeon(R) CPU E5-2630 v3(2.4GHz).
The reference version of SDLS for LABS for CPU has been implemented using OpenMP6
library. It is a concurrency platform for multi-threaded, shared-memory and parallel processing
for C, C++ and Fortran languages. It’s a natural choice to build such algorithms, because
OpenMP allows to easily transform sequential code to a parallel one using appropriate compiler
directives to generate threads for the parallel processor platform (there is no need to create
threads nor assign tasks to each of them manually). In the experiments a special compiler
directive -O3 was used to optimize multi-core CPU implementation.
Both versions have been run for LABS sequences of lengths L= 48 and L= 201, comparing
times of execution for different number of solutions (1000, 4000, 8000, 16000, 32000, 64000).
Number of SDLS iterations has been set to 128. The results were gathered from 10 executions
of each configuration – the average values has been presented. The observed standard deviation
of the average results was negligible.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 22.61 698.8 30.90 348.2 15.40 239.2 10.58
4000 73.93 2663.8 36.03 1398.5 18.92 1120.9 15.16
8000 144.39 5341.6 36.99 2791.8 19.33 2281 15.80
16000 285.74 10683.6 37.39 5600.5 19.60 5030.9 17.61
32000 568.99 21380.4 37.58 11795.9 20.73 6605.2 11.61
64000 1129.87 42803.3 37.88 23574.8 20.87 13077.1 11.57
Table 1: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 48
6http://www.openmp.org/
8
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 916.35 11645.4 12.71 5756.9 6.28 3016.8 3.29
4000 3314.44 46524.3 14.04 23114.4 6.97 12059.2 3.64
8000 6869.51 93227.1 13.57 46574.2 6.78 24575.4 3.58
16000 12583.08 185832.4 14.77 93483.8 7.43 50674.3 4.03
32000 26573.16 376872.9 14.18 197617.3 7.44 112675.5 4.24
64000 53760.38 742100.9 13.80 396481.4 7.37 230078.9 4.28
Table 2: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 201
Results shown in the tables 1 and 2 present average times in milliseconds for particular
configurations together with speed-up value, computed as relation between time on CPU and
GPU version (Sp=TCPU
TGP U ). For L= 48 sequences (table 1), GPU algorithm is significantly
faster than sequential version on CPU (30 37) and still gives very good speed-up even for 4
cores on CPU (10 17) regardless of number of solutions. For L= 201 sequences (table 2),
GPU executes around 13 faster than sequential version and around 3.24.2faster than 4
cores CPU version. Some decrease of speed-up is caused by a fact, that the proposed algorithm
is optimized for L200 sequences and than it utilizes most of GPU logical resources. In such
cases, the physical GPU resources have to be shared between particular blocks and threads,
and so the efficiency is lower. In all cases parallel version of SDLS on GPU gives significant
efficiency improvement and shows that presented implementation can be used in fast finding
better LABS sequences and narrows domain for further process of LABS search using other
techniques such as evolutionary algorithms.
6 Conclusions and further work
This paper is one of the first steps toward efficient hybrid platform dedicated for solving difficult
discrete problems such as LABS or Golomb-ruler optimization, that utilize both evolutionary
multi-agent systems as well as local optimization techniques implemented on GPU. The pre-
sented implementation of LABS optimization shows that a significant speed-up can be achieved
using parallel GPU computations. The implementation can be further integrated with meta-
heuristics such as evolutionary algorithms, which constitute a basis for the concept of hybrid
computational environment in master-slave model that seems to be very promising.
In the near future the authors plan to fully integrate AgE platform with local optimiza-
tion algorithm implemented on GPU. Important direction of the research is to adjust both
algorithms to lower communication overhead between CPU and GPU which includes required
data conversions. Also, the plan assumes the implementation of the hybrid concept for other
difficult problems that can be solved using algorithms with parallel structure – some research
for optimal Golomb ruler search has been already published in [17, 16].
6.1 Acknowledgments
The research presented in the paper received support from AGH University of Science and Tech-
nology statutory project and by the Faculty of Computer Science, Electronics and Telecommu-
nications Dean’s Grant for Ph.D. Students and Young Researchers.
9
886 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
References
[1] E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on
Evolutionary Computation, 6(5):443–462, Oct 2002.
[2] Michael Bartholomew-Biggs. The Steepest Descent Method, pages 1–8. Springer US, Boston, MA,
2008.
[3] B. Bošković, F. Brglez, and J. Brest. A GitHub Archive for Solvers and Solutions of the labs
problem. For updates, see https://github.com/borkob/git_labs. , January 2016.
[4] Aleksander Byrski and Marek Kisiel-Dorohinicki. Agent-based model and computing environment
facilitating the development of distributed computational intelligence systems. In Proceedings of
the 9th International Conference on Computational Science, ICCS 2009, pages 865–874, Berlin,
Heidelberg, 2009. Springer-Verlag.
[5] Erick Cantu-Paz. Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publish-
ers, Norwell, MA, USA, 2000.
[6] K. Cetnarowicz, M. Kisiel-Dorohinicki, and E. Nawarecki. The application of evolution process
in multi-agent world (MAW) to the prediction system. In Proc. of 2nd Int. Conf. on Multi-Agent
Systems (ICMAS’96). AAAI Press, 1996.
[7] C. deGroot, K.H. Hoffmann, and D. Würtz. Low Autocorrelation Binary Sequences: Exact Enu-
meration and Optimization by Evolution Strategies Claas DeGroot; Diethelm Würtz; Karl Heinz
Hoffmann. IPS research report. Eidgenössische Technische Hochschule Zürich, Interdisziplinäres
Projektzentrum für Supercomputing, 1989.
[8] Iván Dotú and Pascal Van Hentenryck. A Note on Low Autocorrelation Binary Sequences, pages
685–689. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
[9] José E. Gallardo, Carlos Cotta, and Antonio J. Fernández. Finding low autocorrelation binary
sequences with memetic algorithms. Appl. Soft Comput., 9(4):1252–1262, September 2009.
[10] M. Golay. The merit factor of long low autocorrelation binary sequences (corresp.). IEEE Trans-
actions on Information Theory, 28(3):543–549, May 1982.
[11] Steven Halim, Roland H. C. Yap, and Felix Halim. Engineering Stochastic Local Search for the
Low Autocorrelation Binary Sequence Problem, pages 640–645. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2008.
[12] Magdalena Kolybacz, Michal Kowol, Lukasz Lesniak, Aleksander Byrski, and Marek Kisiel-
Dorohinicki. Efficiency of memetic and evolutionary computing in combinatorial optimisation.
In Webjørn Rekdalsbakken, Robin T. Bye, and Houxiang Zhang, editors, ECMS, pages 525–531.
European Council for Modeling and Simulation, 2013.
[13] Michał Kowol, Aleksander Byrski, and Marek Kisiel-Dorohinicki. Agent-based evolutionary com-
puting for difficult discrete problems. Procedia Computer Science, 29:1039 – 1047, 2014.
[14] Tom Packebusch and Stephan Mertens. Low autocorrelation binary sequences. Journal of Physics
A: Mathematical and Theoretical, 49(16):165001, 2016.
[15] Kamil Piętak, Adam Woś, Aleksander Byrski, and Marek Kisiel-Dorohinicki. Functional Integrity
of Multi-agent Computational System Supported by Component-Based Implementation, pages 82–
91. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[16] M. Pietron, A. Byrski, and M. Kisiel-Dorohinicki. Leveraging heterogeneous parallel platform
in solving hard discrete optimization problems with metaheuristics. Journal of Computational
Science, 18:59 – 68, 2017.
[17] Marcin Pietron, Aleksander Byrski, and Marek Kisiel-Dorohinicki. Gpgpu for difficult black-box
problems. Procedia Computer Science, 51:1023 – 1032, 2015.
[18] Toby Walsh. CSPLib problem 005: Low autocorrelation binary sequences. http://www.csplib.
org/Problems/prob005. Accessed: 2017-01-31.
10
... Prevalent methods of solving LABS are heuristic algorithms that utilize plausible rules to locate satisfactory sequences more quickly. A well-known method is Steepest Descend Local Search (SDLS) [1], which is also very effective on GPGPU architectures [15,12]. The authors of [16] proposed two new variants of the SDLS algorithm that extend the neighborhood of the sequence to a 2-bit and recurrent exploration of sequences at the 1 and 2-bit distance. ...
... The memetic agent-based paradigm [48], which combines evolutionary computation and local search techniques using parallel GPU implementation, is one of the promising meta-heuristics for solving a LABS problem. Figure 1 shows the normalized aperiodic autocorrelation function (NAAF) in dB, i.e., 20log 10 C k (S) L , of two binary sequences of length 213. One is randomly generated with F = 1.3572, and another has F = 9.5393, which is currently the best-known merit factor for sequences with a length over 200. ...
Article
Full-text available
In this paper, we present a computational search for best-known merit factors of longer binary sequences with an odd length. Finding low autocorrelation binary sequences with optimal or suboptimal merit factors is a very difficult optimization problem. An improved version of the heuristic algorithm is presented and tackled to search for aperiodic binary sequences with good autocorrelation properties. High-performance computations with the execution of our stochastic algorithmto search skew-symmetric binary sequences with high merit factors. After experimental work, as results, we present new binary sequences with odd lengths between 201 and 303 that are skew-symmetric and have the merit factor F greater than 8.5. Moreover, an example of a binary sequence having F > 8 has been found for all odd lengths between 201 and 303. The longest binary sequence with F > 9 found to date is of length 255.
... One of such meta-heuristics is the concept of an evolutionary multi-agent system (EMAS) proposed by K. Cetnarowicz (Cetnarowicz et al., 1996) and successfully applied for solving complex problems. EMAS has already been used to optimize some complex continues and discrete problems such as LABS (Kowol et al., 2017;Pietak et al., 2019;Żurek et al., 2017), TSP Dreżewski, Woźniak, et al. (2009) or Investment Strategies Generation (Drezewski, Sepielak, et al., 2009). ...
Article
Full-text available
Agent-based evolutionary, computational systems have been proven to be an efficient concept for solving complex computational problems. This paper is an extension of [Biełaszek, S., Piętak, K., & Kisiel-Dorohinicki, M. (2021). New extensions of reproduction operators in solving LABS problem using EMAS meta-heuristic. Springer, cop. 2021. – Lecture Notes in Artificial Intelligence, Computational collective intelligence 12876 304-316. 13th International Conference, ICCCI 2021: Rhodes, Greece, September 29ŰOctober 1, 2021.] where we proposed new variants of reproduction operators together with new heuristics for the generation of initial population, dedicated to LABS – a hard discrete optimization problem. In this research, we verify if the proposed recombination operators improve EMAS efficiency also with different local optimization techniques such as Tabu Search and Self-avoiding walk, and therefore can be seen as better recombination operators dedicated to LABS problem in general. This paper recalls the definition of new recombination variants dedicated to LABS and verify if they can be successfully used in many different evolutionary configurations.
... Memetic agent-based paradigm [36], which combines evolutionary computation and local search techniques using parallel GPU implementation, is one of the promising meta-heuristics for solving a LABS problem. Figure 1 shows the randomly generated binary sequence of length 213 with = 1.3572. Figure 2 represents the binary sequence also of length 213, but with = 9.5393, which is currently best known merit factor for sequences with length over 200. ...
Preprint
Full-text available
In this paper we present best-known merit factors of longer binary sequences with odd length. Finding low autocorrelation binary sequences with optimal merit factors is difficult optimization problem. High performance computations with execution of a stochastic algorithm in parallel, enable us searching skew-symmetric binary sequences with high merit factors. After experimental work, as results we present sequences with odd length between 301 and 401 that are skew-symmetric and have merit factor F greater than 7. Moreover, now all sequences with odd length between 301 and 401 with F > 7 have been found.
Chapter
One of the leading approaches for solving various hard discrete problems is designing advanced solvers based on local search heuristics. This observation is also relevant to the low autocorrelation binary sequence (LABS) – an open hard optimisation problem that has many applications. There are a lot of dedicated heuristics such as the steepest-descent local search algorithm (SDLS), Tabu search or xLostovka algorithms. This paper introduce a new concept of combining well-known solvers with neural networks that improve the solvers’ parameters based on the local context. The contribution proposes the extension of Tabu search (one of the well-known optimisation heuristics) with the LSTM neural network to optimise the number of iterations for which particular bits are blocked. Regarding the presented results, it should be concluded that the proposed approach is a very promising direction for developing highly efficient heuristics for LABS problem.KeywordsLABSTABUNeural networkLSTM
Chapter
5G networks offer novel communication infrastructure for Internet of Things applications, especially for healthcare applications. There, edge computing enabled Internet of Medical Things provides online patient status monitoring. In this contribution, a Chicken Swarm Optimization algorithm, based on Energy Efficient Multi-objective clustering is applied in an IoMT system. An effective fitness function is designed for cluster head selection. In a simulated environment, performance of proposed scheme is evaluated. KeywordsEnergy efficiencyNetwork lifetimeClusteringCluster head selectionDelayChicken swarm optimizationSensor networksAdaptive networks
Chapter
Agent-based evolutionary, computational systems have been proven to be an efficient concept for solving complex computational problems. In this paper, we propose and evaluate new variants of reproduction operators together with new heuristics for generation of initial population, dedicated to LABS – a hard discrete optimisation problem. The paper illustrates how to design and implement particular parts of the algorithm and discusses required conditions for evolutionary parameters and operators.
Chapter
Low autocorrelation binary sequence (LABS) remains an open hard optimisation problem that has many applications. One of the promising directions for solving the problem is designing advanced solvers based on local search heuristics. The paper proposes two new heuristics developed from the steepest-descent local search algorithm (SDLS), implemented on the GPGPU architectures. The introduced algorithms utilise the parallel nature of the GPU and provide an effective method of solving the LABS problem. As a means for comparison, the efficiency between SDSL and the new algorithms is presented, showing that exploring the wider neighbourhood improves the results.
Article
Agent-based metaheuristics computing paradigm (EMAS) has been proposed over 20 years ago by Cetnarowicz. Since then, many efforts were made in order to evaluate, formally analyze and further develop this paradigm towards creating new algorithms as EMAS hybrids, or EMAS-inspired techniques. However, at the same time a significant work has been done in order to build efficient software frameworks supporting this (and similar) computing paradigms. These frameworks were based not only on classic object-oriented programming, but also on functional approach and recently also utilizing heterogeneous infrastructure. This paper presents an overview of the most important findings in this area, including novel ways of processing the agents and component orientation, which allow for both high flexibility and high efficiency of provided solutions. The discussed concepts are illustrated with a case study of a system solving hard computational problem leveraging GPGPU.
Article
Full-text available
The research reported in the paper deals with difficult black-box problems solved by means of popular metaheuristic algorithms implemented on up-to-date parallel, multi-core, and many-core platforms. In consecutive publications we are trying to show how particular population-based techniques may further benefit from employing dedicated hardware like GPGPU or FPGA for delegating different parts of the computing in order to speed it up. The main contribution of this paper is an experimental study focused on profiling of different possibilities of implementation of Scatter Search algorithm, especially delegating some of its selected components to GPGPU. As a result, a concise know-how related to the implementation of a population-based metaheuristic similar to Scatter Search is presented using a difficult discrete optimization problem; namely, Golomb Ruler, as a benchmark.
Article
Full-text available
Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time Θ(N1.727N)\Theta(N\,1.727^N). We computed all optimal sequences for N66N\leq 66 and all optimal skewsymmetric sequences for N117N\leq 117.
Article
Full-text available
Difficult black-box problems arise in many scientific and industrial areas. In this paper, efficient use of a hardware accelerator to implement dedicated solvers for such problems is discussed and studied based on an example of Golomb Ruler problem. The actual solution of the problem is shown based on evolutionary and memetic algorithms accelerated on GPGPU. The presented results prove that GPGPU outperforms CPU in some memetic algorithms which can be used as a part of hybrid algorithm of finding near optimal solutions of Golomb Ruler problem. The presented research is a part of building heterogenous parallel algorithm for difficult black-box Golomb Ruler problem.
Article
Full-text available
Hybridizing agent-based paradigm with evolutionary computation can enhance the field of metaheuristics in a significant way, giving to usually passive individuals autonomy and capabilities of perception and interaction with other ones, treating them as agents. In the paper as a follow-up to the previous research, an evolutionary multi-agent system (EMAS) is examined in difficult discrete benchmark problems. As a means for comparison, classical evolutionary algorithm (constructed along with Michalewicz model) implemented in island-model is used. The results encourage for further research regarding application of EMAS in discrete problem domain.
Conference Paper
Full-text available
Difficult search and optimisation problems call for complex techniques for solving them. In particular, in cases when fitness function is costly, applying solutions, such as agent-based computing systems may be fruitful. This approach may yield even better results, in the case of memetic computing, as these algorithms tend to significantly increase the number of fitness function calls, because of their nature. This paper may be treated as a milestone in preparing to tackle combinatorial optimisation problems with memetic approaches in agent-based systems. After discussing the selected problems and details of population-oriented meta-heuristics to solve them, experimental results (with stress put on efficiency) are presented. Then details of applying EMAS-class systems are given and, in the end, preliminary EMAS results obtained for combinatorial optimisation are shown and the work is concluded.
Article
Full-text available
Software systems become more and more com- plex thus the application of self-developing dis- tributed and decentralized processing is indis- pensable. The complexity of such systems re- quires new tools for designing, programming and debugging processes which imp|ies the fact that new approaches to decentralization should be un- de.rtaken. An idea of autonomous agents arises as an extension to the object and process concepts. The active agent is invented as a basic element of which distributed and decentralized systems can be built. The use of evolution strategies in design of multi-ageat systems reveals new posibilities of developing complex software systems. "rh(: cwJ- lution plays also a key role in creation and orga- ni~.ation of social structures. In this paper a new te(:hnology of designing and building agent sy~ terns based on genetic methods and a draft con- cept of a model-based approach to such systems are described. Also an application of this tech- nology to a self-developing prediction system is presented and results of simulation experiments carried out with the use of I)-1 random time series are discussed.
Conference Paper
Full-text available
This paper engineers a new state-of-the-art Stochastic Lo- cal Search (SLS) for the Low Autocorrelation Binary Sequence (LABS) problem. The new SLS solver is obtained with white-box visualization to get insights on how an SLS can be effective for LABS; implementation improvements; and black-box parameter tuning.
Article
We investigate skew-symmetric sequences with chain lengths up to N = 71, giving a complete table of all merit factors F≥7 and their associated configurations. We also calculate the exact thermodynami-cal properties of shorter chains (N≤55). We then introduce an evolutionary strategy, describing the properties of our search algorithm and comparing our results to those of other heuristic methods such as simulated annealing. We find the highest merit factors ever reached for chains of length 81≤N≤201.
Conference Paper
The Low Autocorrelation Binary Sequences problem (LABS) is problem 005 in the CSPLIB library, where it is stated that “these problems pose a significant challenge to local search methods”. This paper presents a straighforward tabu search that systematically finds the optimal solutions for all tested instances.