Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 108C (2017) 877–886
1877-0509 © 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Science
10.1016/j.procs.2017.05.201
International Conference on Computational Science, ICCS 2017, 12-14 June 2017,
Zurich, Switzerland
10.1016/j.procs.2017.05.201 1877-0509
© 2017 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientic committee of the International Conference on Computational Science
This space is reserved for the Procedia header, do not use it
Toward hybrid platform for evolutionary computations
of hard discrete problems
Dominik Żurek1, Kamil Piętak1, Marcin Pietroń1, and Marek Kisiel-Dorohinicki1
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
{dzurek,kpietak,pietron,doroh}@agh.edu.pl
Abstract
Memetic agent-based paradigm, which combines evolutionary computation and local search
techniques in one of promising meta-heuristics for solving large and hard discrete problem
such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the
paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary
systems platform, which spreads computations among CPU and GPU, is shortly introduced.
The main part of the paper presents an efficient parallel GPU implementation of LABS local
optimization strategy. As a means for comparison, speed-up between GPU implementation
and CPU sequential and parallel versions are shown. This constitutes a promising step toward
building hybrid platform that combines evolutionary meta-heuristics with highly efficient local
optimization of chosen discrete problems.
Keywords: evolutionary computing, GPU computing, memetic computing, LABS
1 Introduction
In the paper, as a follow-up of the research presented in [13], the next step in solving Low Auto-
corellation Binary Sequence with efficient techniques is shown. LABS, one of the hard discrete
problem despite wide research, is still an open optimization problem for long sequences. The pa-
per introduces a very efficient, parallel realization of local optimization for LABS implemented
at GPU. The implementation is thought as a part of a hybrid computational environment
that utilize agent-based evolutionary meta-heuristics. The integration of the environment and
proposed components will be a topic for further research.
There is a lot of various methods that tries to solve LABS problem. The simplest one
is exhaustive enumeration (ie. brute-force method) that provides the best results, but can be
applied only to small values of L. Some researchers use a partial enumeration, choosing so
called skew symmetric sequences[14] that are the most likely solutions for many lengths (eg.
for L∈[31,65], 21 best sequences are skew symmetric).
However, enumerative algorithms (complete or partial) are limited to small values of Lby
the exponential size of the search space. Heuristic algorithms use some plausible rules to locate
1
This space is reserved for the Procedia header, do not use it
Toward hybrid platform for evolutionary computations
of hard discrete problems
Dominik Żurek1, Kamil Piętak1, Marcin Pietroń1, and Marek Kisiel-Dorohinicki1
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
{dzurek,kpietak,pietron,doroh}@agh.edu.pl
Abstract
Memetic agent-based paradigm, which combines evolutionary computation and local search
techniques in one of promising meta-heuristics for solving large and hard discrete problem
such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the
paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary
systems platform, which spreads computations among CPU and GPU, is shortly introduced.
The main part of the paper presents an efficient parallel GPU implementation of LABS local
optimization strategy. As a means for comparison, speed-up between GPU implementation
and CPU sequential and parallel versions are shown. This constitutes a promising step toward
building hybrid platform that combines evolutionary meta-heuristics with highly efficient local
optimization of chosen discrete problems.
Keywords: evolutionary computing, GPU computing, memetic computing, LABS
1 Introduction
In the paper, as a follow-up of the research presented in [13], the next step in solving Low Auto-
corellation Binary Sequence with efficient techniques is shown. LABS, one of the hard discrete
problem despite wide research, is still an open optimization problem for long sequences. The pa-
per introduces a very efficient, parallel realization of local optimization for LABS implemented
at GPU. The implementation is thought as a part of a hybrid computational environment
that utilize agent-based evolutionary meta-heuristics. The integration of the environment and
proposed components will be a topic for further research.
There is a lot of various methods that tries to solve LABS problem. The simplest one
is exhaustive enumeration (ie. brute-force method) that provides the best results, but can be
applied only to small values of L. Some researchers use a partial enumeration, choosing so
called skew symmetric sequences[14] that are the most likely solutions for many lengths (eg.
for L∈[31,65], 21 best sequences are skew symmetric).
However, enumerative algorithms (complete or partial) are limited to small values of Lby
the exponential size of the search space. Heuristic algorithms use some plausible rules to locate
1
This space is reserved for the Procedia header, do not use it
Toward hybrid platform for evolutionary computations
of hard discrete problems
Dominik Żurek1, Kamil Piętak1, Marcin Pietroń1, and Marek Kisiel-Dorohinicki1
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
{dzurek,kpietak,pietron,doroh}@agh.edu.pl
Abstract
Memetic agent-based paradigm, which combines evolutionary computation and local search
techniques in one of promising meta-heuristics for solving large and hard discrete problem
such as Low Autocorrellation Binary Sequence (LABS) or optimal Golomb-ruler (OGR). In the
paper as a follow-up of the previous research, a short concept of hybrid agent-based evolutionary
systems platform, which spreads computations among CPU and GPU, is shortly introduced.
The main part of the paper presents an efficient parallel GPU implementation of LABS local
optimization strategy. As a means for comparison, speed-up between GPU implementation
and CPU sequential and parallel versions are shown. This constitutes a promising step toward
building hybrid platform that combines evolutionary meta-heuristics with highly efficient local
optimization of chosen discrete problems.
Keywords: evolutionary computing, GPU computing, memetic computing, LABS
1 Introduction
In the paper, as a follow-up of the research presented in [13], the next step in solving Low Auto-
corellation Binary Sequence with efficient techniques is shown. LABS, one of the hard discrete
problem despite wide research, is still an open optimization problem for long sequences. The pa-
per introduces a very efficient, parallel realization of local optimization for LABS implemented
at GPU. The implementation is thought as a part of a hybrid computational environment
that utilize agent-based evolutionary meta-heuristics. The integration of the environment and
proposed components will be a topic for further research.
There is a lot of various methods that tries to solve LABS problem. The simplest one
is exhaustive enumeration (ie. brute-force method) that provides the best results, but can be
applied only to small values of L. Some researchers use a partial enumeration, choosing so
called skew symmetric sequences[14] that are the most likely solutions for many lengths (eg.
for L∈[31,65], 21 best sequences are skew symmetric).
However, enumerative algorithms (complete or partial) are limited to small values of Lby
the exponential size of the search space. Heuristic algorithms use some plausible rules to locate
1
878 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
good sequences more quickly. Examples include simulated annealing, evolutionary algorithms,
local optimization techniques – the list of heuristic algorithms can be found in [7]. A well-
known method of local optimization for LABS is Tabu Search [9, 8] – the best results have been
obtained using some variations of this technique [11]. All these methods give relatively good
solutions for L<200, but fail for larger lengths. As a result of this fact LABS problem found
its place in CSPLIB list, a library of test problems for constraint solvers [18]. The best known
results (BKVs) for LABS can be found in [3].
The paper concentrates on solving LABS problem using efficient parallel computations on
GPU which will be further combined with evolutionary meta-heuristics. The authors introduce
the basic concept of hybrid computational environment that combines CPU and GPU compu-
tations and widely presents the way of solving LABS problem on GPU using one of local search
strategies called Steepest Descend Local Search (SDLS) [2].
The paper is organized as follows: introduction to LABS problem together with optimized
algorithm for energy computation are presented to show a background for GPU algorithm.
Then short introduction to the concept of evolutionary multi-agent systems (EMAS) with local
optimization techniques has been provided, followed by an overview of a platform allowing
to run such systems in a hybrid environment comprised of CPU and GPU. The next section
presents details about LABS evaluation and local optimization implemented on GPU, which
constitutes the central point of the paper. Than, the sample results are presented to compare
efficiency between pure CPU and GPU computations and to compute speed-up we gain using
GPU. At last, after the conclusions, the future work such as integration of CPU and GPU in
the shape of hybrid computational platform is presented.
2 LABS problem
Low Autocorrelation Binary Sequence (LABS) is an NP-hard combinatorial problem with simple
formulation. It has been under intensive study since 1960s by physics and artificial intelligence
communities. It consists in finding a binary sequence S={s0,s
1, ..., sL−1}with length L,
where si∈ {−1,1}which minimizes energy function E(S):
Ck(S)=
L−k−1
i=0
sisi+k
E(S)=
L−1
k=1
C2
k(S)
(1)
M.J Golay defined also a so called merit factor [10], which binds LABS energy level to the
length of a given sequence:
F(S)= L2
2E(S)(2)
The search space for the problem with length Lhas size 2Land energy of sequence can be
computed in time O(L2). LABS problem has no constraints, so Scan be represented naturally
as an array of binary values. One of the reason of high complexity of the problem is, that in
LABS all sequence elements are correlated. One change that improves some Ci(S), has also an
impact on many other Cj(S)and can lead to big changes of solution’s energy.
2
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
s1s2s2s3s3s4s4s5
s1s3s2s4s3s5
s1s4s2s5
s1s5
T(S)
s1s2+s2s3+s3s4+s4s5
s1s3+s2s4+s3s5
s1s4+s2s5
s1s5
C(S)
Figure 1: Data structures introduced to efficiently compute fitness in the neighborhood.
Computing the energy can be speed up with technique proposed by Cotta et al.[9], who
introduced the notion of neighborhood of a solution Swith length Lobtained by flipping
exactly one symbol in the sequence:
N(S)={flip(S, i)|i∈{1, .., L}} (3)
where flip(s1...si...sL,i)=s1.... −si.....sL[9].
Than, all computed products can be stored in a (L−1) ×(L−1) table T(S), such that
T(S)ij =sjsi+jfor j≤L−i, and saving the values of the different correlations in a L−1
dimensional vector C(S), defined as C(S)k=Ck(S)for 1≤k≤L−1. Figure 1 shows these
data structures for a L = 5 instance. Cotta observed that flipping a single symbol simultiples
by −1the value of cells in T(S)where siis involved, the fitness of sequence flip(S, i)can be
efficiently recomputed in time O(L)using the algorithm 1.
Algorithm 1 Computing LABS energy using T(S)and C(S)structures
1: function ValueFlip(S,i,T,C)
2: f:= 0
3: for p:= 0 to L−1do
4: v:= Cp
5: if p≤L−ithen v:= 2Tpi end if
6: if p<ithen v:= 2Tp(i−p)end if
7: f:= f+v2
end for
8: return f
end function
The flipping method can be successfully used in various local optimization techniques where
usually one interactions modifies only one sequence element. Such technique can be used for
example in Steepest Descent Local Search method, which modifies one form the sequence ele-
ments and than checks if a new sequence has better energy. If so, the next random element is
modified. The process is continued until the no new better sequence is created. The number
of iterations can be also set to chosen constant – this technique can be very useful when one
optimizes parallely many sequences, because the time of optimization for each solution is more
or less the same, ie. there is almost no time loss dedicated for synchronization. The SDLS
algorithm that uses flip operations to compute LABS energy has been proposed also in [9].
3
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886 879
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
good sequences more quickly. Examples include simulated annealing, evolutionary algorithms,
local optimization techniques – the list of heuristic algorithms can be found in [7]. A well-
known method of local optimization for LABS is Tabu Search [9, 8] – the best results have been
obtained using some variations of this technique [11]. All these methods give relatively good
solutions for L<200, but fail for larger lengths. As a result of this fact LABS problem found
its place in CSPLIB list, a library of test problems for constraint solvers [18]. The best known
results (BKVs) for LABS can be found in [3].
The paper concentrates on solving LABS problem using efficient parallel computations on
GPU which will be further combined with evolutionary meta-heuristics. The authors introduce
the basic concept of hybrid computational environment that combines CPU and GPU compu-
tations and widely presents the way of solving LABS problem on GPU using one of local search
strategies called Steepest Descend Local Search (SDLS) [2].
The paper is organized as follows: introduction to LABS problem together with optimized
algorithm for energy computation are presented to show a background for GPU algorithm.
Then short introduction to the concept of evolutionary multi-agent systems (EMAS) with local
optimization techniques has been provided, followed by an overview of a platform allowing
to run such systems in a hybrid environment comprised of CPU and GPU. The next section
presents details about LABS evaluation and local optimization implemented on GPU, which
constitutes the central point of the paper. Than, the sample results are presented to compare
efficiency between pure CPU and GPU computations and to compute speed-up we gain using
GPU. At last, after the conclusions, the future work such as integration of CPU and GPU in
the shape of hybrid computational platform is presented.
2 LABS problem
Low Autocorrelation Binary Sequence (LABS) is an NP-hard combinatorial problem with simple
formulation. It has been under intensive study since 1960s by physics and artificial intelligence
communities. It consists in finding a binary sequence S={s0,s
1, ..., sL−1}with length L,
where si∈ {−1,1}which minimizes energy function E(S):
Ck(S)=
L−k−1
i=0
sisi+k
E(S)=
L−1
k=1
C2
k(S)
(1)
M.J Golay defined also a so called merit factor [10], which binds LABS energy level to the
length of a given sequence:
F(S)= L2
2E(S)(2)
The search space for the problem with length Lhas size 2Land energy of sequence can be
computed in time O(L2). LABS problem has no constraints, so Scan be represented naturally
as an array of binary values. One of the reason of high complexity of the problem is, that in
LABS all sequence elements are correlated. One change that improves some Ci(S), has also an
impact on many other Cj(S)and can lead to big changes of solution’s energy.
2
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
s1s2s2s3s3s4s4s5
s1s3s2s4s3s5
s1s4s2s5
s1s5
T(S)
s1s2+s2s3+s3s4+s4s5
s1s3+s2s4+s3s5
s1s4+s2s5
s1s5
C(S)
Figure 1: Data structures introduced to efficiently compute fitness in the neighborhood.
Computing the energy can be speed up with technique proposed by Cotta et al.[9], who
introduced the notion of neighborhood of a solution Swith length Lobtained by flipping
exactly one symbol in the sequence:
N(S)={flip(S, i)|i∈{1, .., L}} (3)
where flip(s1...si...sL,i)=s1.... −si.....sL[9].
Than, all computed products can be stored in a (L−1) ×(L−1) table T(S), such that
T(S)ij =sjsi+jfor j≤L−i, and saving the values of the different correlations in a L−1
dimensional vector C(S), defined as C(S)k=Ck(S)for 1≤k≤L−1. Figure 1 shows these
data structures for a L = 5 instance. Cotta observed that flipping a single symbol simultiples
by −1the value of cells in T(S)where siis involved, the fitness of sequence flip(S, i)can be
efficiently recomputed in time O(L)using the algorithm 1.
Algorithm 1 Computing LABS energy using T(S)and C(S)structures
1: function ValueFlip(S,i,T,C)
2: f:= 0
3: for p:= 0 to L−1do
4: v:= Cp
5: if p≤L−ithen v:= 2Tpi end if
6: if p<ithen v:= 2Tp(i−p)end if
7: f:= f+v2
end for
8: return f
end function
The flipping method can be successfully used in various local optimization techniques where
usually one interactions modifies only one sequence element. Such technique can be used for
example in Steepest Descent Local Search method, which modifies one form the sequence ele-
ments and than checks if a new sequence has better energy. If so, the next random element is
modified. The process is continued until the no new better sequence is created. The number
of iterations can be also set to chosen constant – this technique can be very useful when one
optimizes parallely many sequences, because the time of optimization for each solution is more
or less the same, ie. there is almost no time loss dedicated for synchronization. The SDLS
algorithm that uses flip operations to compute LABS energy has been proposed also in [9].
3
880 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
3 Evolutionary meta-heuristics with local optimization for
LABS
One of the promising metaheuristic in solving hard problems is evolutionary multi-agent system
proposed by K. Cetnarowicz [6] and developed for many years of research by IISG group at
AGH-UST in Krakow. The EMAS extended with local optimization techniques, so called
memetic EMAS, has been already applied to hard discrete problems such as LABS [13, 12].
3.1 The concept of the memetic EMAS
In a basic EMAS, the system is comprised of individual agents living in a population1. The
main concepts of evolution are realized in the shape of reproduction and death operations. In
optimization problems the individuals contain a solution decoded in the shape of genotype. The
genotype is inherited from parents using variation operators (ie. mutation and recombination).
As there is no global knowledge, in contrast to classic evolutionary algorithm, the selection
is based on energy resource belonging to each individual. The level of energy is related to its
quality and causes various behavior of an individual: high level of energy allows it to reproduce,
average level leads to energy transfer between individuals (“fight” operation) and low level causes
agent’s death.
Each new born individual has to be evaluated by a fitness function which rates it’s genotype
in context of a given problem. In memetic variant of EMAS, the evaluation can be further
improved utilizing various local optimization techniques such as Tabu Search, Steepest Descent
Local Search or Random Mutation Hill Climbing.
Algorithm 2 A simplified algorithm of population processing in memetic EMAS
1: function processPopulation(population)
2: pairs := selectP airs(population)
3: newBorn := Array[]
4: for pair in pairs do
5: if canReproduce(pair)then newBorn+=reproduce(pair)
6: elsefight(pair)end if
end for
7: newBorn := evaluate(newBorn)
8: newBorn := improve(newBorn)
9: population+=newBorn
10: for ind in population do
11: if shouldDie(ind)then remove(population, ind) end if
end for
12: return population
end function
The concept of memetic EMAS can be also illustrated using a simplified, sequential al-
gorithm shown in 2. The processPopultation function contains a single “step” that is called
sequentially until a stop condition is reached. Within each step, pairs for reproduction or fight
are selected from the current population. If a selected pair is “good enough”, then a new indi-
vidual is “born” from the selected parents. Otherwise, individuals from the pair fight with each
other, and in consequence a portion of life energy is transferred between them. Next, all new
1There is also multi-populations variant of EMAS, but it is out of the scope of this paper.
4
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
born individuals are then evaluated and improved using local optimization techniques. At last,
all week individuals are “killed”, ie. removed from the population.
In [13, 12] we shown results coming from many experiments performed using pure CPU
architecture with one or many threads. The results show that memetic EMAS is a promising
technique for solving not only relatively small LABS problems (L∼50), but also sizes that still
are a challenge (eg. L∼200).
3.2 The concept of a hybrid platform
Evolutionary multi-agent systems, as well as classic evolutionary algorithms, can be easily
parallelised using various techniques [5, 1]. Two different research directions in this field can be
observed:
•decomposition models that assume geographic division of population and constraints for
interaction between particular individuals; the whole process of evolution is then paral-
lelised in particular sub-populations; the most popular decomposition concepts are mi-
gration and cell models;
•global parallelisation in which particular steps of evolution performed on different indi-
viduals are implemented on many computational nodes.
When computations with GPUs are considered, the global parallelisation model can be
applied by using master-slave architecture. The architecture assumes that central unit performs
most of the evolutionary process (ie. reproduction, energy transfer or death operations) and
the unit delegates some of most expensive computations to the GPU. In case of memetic EMAS
(see algorithm 2) the operations of evaluation and improvement are the good choice to delegate
to slaves. These two operations are combined and extended with required conversions between
CPU and GPU representations of a given problem. To minimize the communication overhead
and use the parallel nature of GPU, the evaluation and improvement operation is done for whole
population at once. The GPU interface gets the variable number of solutions that belong to
newly created individuals and than performs evaluation and improvement. In effect, the GPU
returns a vector of evaluations together with improved solutions to update adequate individuals.
The described concept of the hybrid computational environment is partially implemented on
AgE platform2– a Java-based solution developed as an open-source project by the Intelligent
Information Systems Group of AGH-UST. AgE provides a platform for the development and
execution of distributed agent-based applications in mainly simulation and computational tasks
[4, 15]. The modular architecture of AgE allows to use components to assembly and run agent-
based computations for various problems such as black-box hard discrete problems (eg. LABS,
OGR, Job-shop)[12, 13] or continues optimization problems. The platform is prepared to be
integrated with external solvers such as presented LABS optimization on GPU.
4 Parallel SDLS algorithm for LABS local optimization
To better understand details about evaluation and improvement of LABS solutions, Nlet us
introduce a basis of GPU processing. The GPU processors are constructed as Nmultiproces-
sor structure with Mcores each. The cores share an Instruction Unit with other cores in a
multiprocessor. The multiprocessors contain dedicated memories which are shared among all
cores and are much faster than a global memory. There are also efficient read-only constant
2https://gitlab.com/age-agh/age3
5
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886 881
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
3 Evolutionary meta-heuristics with local optimization for
LABS
One of the promising metaheuristic in solving hard problems is evolutionary multi-agent system
proposed by K. Cetnarowicz [6] and developed for many years of research by IISG group at
AGH-UST in Krakow. The EMAS extended with local optimization techniques, so called
memetic EMAS, has been already applied to hard discrete problems such as LABS [13, 12].
3.1 The concept of the memetic EMAS
In a basic EMAS, the system is comprised of individual agents living in a population1. The
main concepts of evolution are realized in the shape of reproduction and death operations. In
optimization problems the individuals contain a solution decoded in the shape of genotype. The
genotype is inherited from parents using variation operators (ie. mutation and recombination).
As there is no global knowledge, in contrast to classic evolutionary algorithm, the selection
is based on energy resource belonging to each individual. The level of energy is related to its
quality and causes various behavior of an individual: high level of energy allows it to reproduce,
average level leads to energy transfer between individuals (“fight” operation) and low level causes
agent’s death.
Each new born individual has to be evaluated by a fitness function which rates it’s genotype
in context of a given problem. In memetic variant of EMAS, the evaluation can be further
improved utilizing various local optimization techniques such as Tabu Search, Steepest Descent
Local Search or Random Mutation Hill Climbing.
Algorithm 2 A simplified algorithm of population processing in memetic EMAS
1: function processPopulation(population)
2: pairs := selectP airs(population)
3: newBorn := Array[]
4: for pair in pairs do
5: if canReproduce(pair)then newBorn+=reproduce(pair)
6: elsefight(pair)end if
end for
7: newBorn := evaluate(newBorn)
8: newBorn := improve(newBorn)
9: population+=newBorn
10: for ind in population do
11: if shouldDie(ind)then remove(population, ind) end if
end for
12: return population
end function
The concept of memetic EMAS can be also illustrated using a simplified, sequential al-
gorithm shown in 2. The processPopultation function contains a single “step” that is called
sequentially until a stop condition is reached. Within each step, pairs for reproduction or fight
are selected from the current population. If a selected pair is “good enough”, then a new indi-
vidual is “born” from the selected parents. Otherwise, individuals from the pair fight with each
other, and in consequence a portion of life energy is transferred between them. Next, all new
1There is also multi-populations variant of EMAS, but it is out of the scope of this paper.
4
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
born individuals are then evaluated and improved using local optimization techniques. At last,
all week individuals are “killed”, ie. removed from the population.
In [13, 12] we shown results coming from many experiments performed using pure CPU
architecture with one or many threads. The results show that memetic EMAS is a promising
technique for solving not only relatively small LABS problems (L∼50), but also sizes that still
are a challenge (eg. L∼200).
3.2 The concept of a hybrid platform
Evolutionary multi-agent systems, as well as classic evolutionary algorithms, can be easily
parallelised using various techniques [5, 1]. Two different research directions in this field can be
observed:
•decomposition models that assume geographic division of population and constraints for
interaction between particular individuals; the whole process of evolution is then paral-
lelised in particular sub-populations; the most popular decomposition concepts are mi-
gration and cell models;
•global parallelisation in which particular steps of evolution performed on different indi-
viduals are implemented on many computational nodes.
When computations with GPUs are considered, the global parallelisation model can be
applied by using master-slave architecture. The architecture assumes that central unit performs
most of the evolutionary process (ie. reproduction, energy transfer or death operations) and
the unit delegates some of most expensive computations to the GPU. In case of memetic EMAS
(see algorithm 2) the operations of evaluation and improvement are the good choice to delegate
to slaves. These two operations are combined and extended with required conversions between
CPU and GPU representations of a given problem. To minimize the communication overhead
and use the parallel nature of GPU, the evaluation and improvement operation is done for whole
population at once. The GPU interface gets the variable number of solutions that belong to
newly created individuals and than performs evaluation and improvement. In effect, the GPU
returns a vector of evaluations together with improved solutions to update adequate individuals.
The described concept of the hybrid computational environment is partially implemented on
AgE platform2– a Java-based solution developed as an open-source project by the Intelligent
Information Systems Group of AGH-UST. AgE provides a platform for the development and
execution of distributed agent-based applications in mainly simulation and computational tasks
[4, 15]. The modular architecture of AgE allows to use components to assembly and run agent-
based computations for various problems such as black-box hard discrete problems (eg. LABS,
OGR, Job-shop)[12, 13] or continues optimization problems. The platform is prepared to be
integrated with external solvers such as presented LABS optimization on GPU.
4 Parallel SDLS algorithm for LABS local optimization
To better understand details about evaluation and improvement of LABS solutions, Nlet us
introduce a basis of GPU processing. The GPU processors are constructed as Nmultiproces-
sor structure with Mcores each. The cores share an Instruction Unit with other cores in a
multiprocessor. The multiprocessors contain dedicated memories which are shared among all
cores and are much faster than a global memory. There are also efficient read-only constant
2https://gitlab.com/age-agh/age3
5
882 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
and texture memories with global access. Constant memory is efficient for reading constants
and texture memory for random read operations.
In contrast to CPU, GPGPU has less logic to control and branch prediction. GPGPU
enables to run thousands of parallel threads, which are grouped in blocks with shared memory.
The main aspects in development of highly efficient GPU programs are: the usage of mem-
ory, an efficient dividing code to parallel threads and communication between the threads.
Additionally synchronization and communication between the threads should be considered as
important issue.
In this paper the authors used graphics cards from Tesla family which are produced by
NVIDIA3. Those kind of GPGPU can be programmed by high-level programming framework
CUDA (Compute Unified Device Architecture). The CUDA is a set of software layers to
communicate with the GPU based on standard C/C++. The platform gives the direct access
to the GPU’s virtual instruction set and parallel computational elements (C or C++ code
can be directly send to the GPU without using assembly code). Creating the optimized code
requires knowledge of: how threads are mapped to blocks and grids, how to synchronize them
and distinguish access to various types of memory (registers, shared, global and read only
memories).
CUDA provides two kinds of APIs: The runtime API and the Driver API. In the first one,
kernels are defined and compiled in common with C/C++ files. The source code is compiling
using the NVidia Cuda Compiler (NVCC). As a result of compilation there is returned an
executable file containing the entire program. Using the NVCC is not possible to compile a
source code from another language like Java or Python. The Driver API is used when one
needs to invoke CUDA kernels from code of high-level programming language. As an example,
JCuda (Java for CUDA)4can be pointed.
4.1 Parallel SDLS realization for LABS on GPU
Based on analysis of efficient heuristics authors concluded that SDLS algorithm for LABS
problem is suitable to implement in GPU cards, because this approach can be fully paral-
lelized. Module responsible for optimizing LABS problem using SDLS method, receives binary
sequences of length Lrepresented as binary strings in {−1,1}L, which were randomly generated
for example by a curand library5Provided vectors are spread between blocks, so each one of
them contains their own sequence.
In the first step of the proposed algorithm, each block evaluates a given sequence SIusing
two data structures: T(S)and C(S)(see figure 1), which are stored in a shared memory. In
order to create these structures, L−1iterations have to be performed. During the first iteration,
a distance (sisi+distance, where distance < L) has value 1 and there are used L−1threads. In
the next iteration the distance is incremented and the number of threads, which are used to
compute next values of structures (next row in T(S)), is decremented. To create an element of
C(S), after each iteration sum of each Titeration(S)results computed using reduction operation.
To calculate first energy value, which will be used as a reference energy Er, result of reduction
is raised to the power of 2 and after each iteration this value is added to value from the previous
iteration.
The next step of the algorithm is a local optimization that utilize SDLS approach with
given number of iterations. The method explores a sequence neighborhood and therefor allows
3http://www.nvidia.com/object/cuda_home_new.html
4Project homepage: https://jcuda.org
5The curand library provides facilities that focus on the simple and efficient generation of high-quality
pseudo-random and quasi-random numbers.
6
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
to use fast flip strategy in sequence energy computations (see section 2). The concept is similar
to sequential version of SDLS [9], but each thread parallely mutates different bit of the input
sequence. In this process each block uses Lthreads, all of them:
1. change value of sequence element sindexOfThread to opposite one (−1∗sindexOfThread ),
2. compute new energy value according to SDLS algorithm with value flip evaluation [9],
3. store new energy in shared memory and temporary C’(S) in thread’s cache memory (in
order to not calculate common C(S) again after updating it by thread which calculated
best energy)
According to V alueF l ip function (see algorithm 1), the final energy fis summed up from
vvalues that represent cells of new C(S)structure. Next, from the Lenergy values the best
one is chosen using reduction operation on a vector stored in shared memory. Additionally in
the same shared memory ids of corresponding threads are stored. In order to find best energy
and appropriate number of thread two parallel reduction process are run (figure 2). First, to
find minimum energy, the second to find index of thread, which computed this energy.
In the proposed algorithm, all threads create their own C(S)structure in their cache mem-
ory based on vvalues. Only one thread, which computed the best energy in current iteration
doing update of the common C(S)structure (C(S)is overwritten by C(S)of a winner thread).
.......
L
..........
.
.
.
log
2
L
min min min
min
min
min
min min
.......
L
..........
.
.
.
log
2
L
i
id id id id id
id
id
id
Figure 2: Reduction to find best energy and corresponding thread in a single block
Then minimum energy Emin is compared with reference energy Erwhich was computed
from the input sequence SI. If Emin <E
rthen Emin value becomes a new reference energy
and sequence for this energy is used as a new input sequence SI. The thread who computed
Emin energy copies C(S)table from his cache to shared memory. Additionally, the same thread
updates T(S)structure, changing cells, which contain elements with index equals to thread id.
7
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886 883
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
and texture memories with global access. Constant memory is efficient for reading constants
and texture memory for random read operations.
In contrast to CPU, GPGPU has less logic to control and branch prediction. GPGPU
enables to run thousands of parallel threads, which are grouped in blocks with shared memory.
The main aspects in development of highly efficient GPU programs are: the usage of mem-
ory, an efficient dividing code to parallel threads and communication between the threads.
Additionally synchronization and communication between the threads should be considered as
important issue.
In this paper the authors used graphics cards from Tesla family which are produced by
NVIDIA3. Those kind of GPGPU can be programmed by high-level programming framework
CUDA (Compute Unified Device Architecture). The CUDA is a set of software layers to
communicate with the GPU based on standard C/C++. The platform gives the direct access
to the GPU’s virtual instruction set and parallel computational elements (C or C++ code
can be directly send to the GPU without using assembly code). Creating the optimized code
requires knowledge of: how threads are mapped to blocks and grids, how to synchronize them
and distinguish access to various types of memory (registers, shared, global and read only
memories).
CUDA provides two kinds of APIs: The runtime API and the Driver API. In the first one,
kernels are defined and compiled in common with C/C++ files. The source code is compiling
using the NVidia Cuda Compiler (NVCC). As a result of compilation there is returned an
executable file containing the entire program. Using the NVCC is not possible to compile a
source code from another language like Java or Python. The Driver API is used when one
needs to invoke CUDA kernels from code of high-level programming language. As an example,
JCuda (Java for CUDA)4can be pointed.
4.1 Parallel SDLS realization for LABS on GPU
Based on analysis of efficient heuristics authors concluded that SDLS algorithm for LABS
problem is suitable to implement in GPU cards, because this approach can be fully paral-
lelized. Module responsible for optimizing LABS problem using SDLS method, receives binary
sequences of length Lrepresented as binary strings in {−1,1}L, which were randomly generated
for example by a curand library5Provided vectors are spread between blocks, so each one of
them contains their own sequence.
In the first step of the proposed algorithm, each block evaluates a given sequence SIusing
two data structures: T(S)and C(S)(see figure 1), which are stored in a shared memory. In
order to create these structures, L−1iterations have to be performed. During the first iteration,
a distance (sisi+distance, where distance < L) has value 1 and there are used L−1threads. In
the next iteration the distance is incremented and the number of threads, which are used to
compute next values of structures (next row in T(S)), is decremented. To create an element of
C(S), after each iteration sum of each Titeration(S)results computed using reduction operation.
To calculate first energy value, which will be used as a reference energy Er, result of reduction
is raised to the power of 2 and after each iteration this value is added to value from the previous
iteration.
The next step of the algorithm is a local optimization that utilize SDLS approach with
given number of iterations. The method explores a sequence neighborhood and therefor allows
3http://www.nvidia.com/object/cuda_home_new.html
4Project homepage: https://jcuda.org
5The curand library provides facilities that focus on the simple and efficient generation of high-quality
pseudo-random and quasi-random numbers.
6
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
to use fast flip strategy in sequence energy computations (see section 2). The concept is similar
to sequential version of SDLS [9], but each thread parallely mutates different bit of the input
sequence. In this process each block uses Lthreads, all of them:
1. change value of sequence element sindexOfThread to opposite one (−1∗sindexOfThread ),
2. compute new energy value according to SDLS algorithm with value flip evaluation [9],
3. store new energy in shared memory and temporary C’(S) in thread’s cache memory (in
order to not calculate common C(S) again after updating it by thread which calculated
best energy)
According to V alueF l ip function (see algorithm 1), the final energy fis summed up from
vvalues that represent cells of new C(S)structure. Next, from the Lenergy values the best
one is chosen using reduction operation on a vector stored in shared memory. Additionally in
the same shared memory ids of corresponding threads are stored. In order to find best energy
and appropriate number of thread two parallel reduction process are run (figure 2). First, to
find minimum energy, the second to find index of thread, which computed this energy.
In the proposed algorithm, all threads create their own C(S)structure in their cache mem-
ory based on vvalues. Only one thread, which computed the best energy in current iteration
doing update of the common C(S)structure (C(S)is overwritten by C(S)of a winner thread).
.......
L
..........
.
.
.
log
2
L
min min min
min
min
min
min min
.......
L
..........
.
.
.
log
2
L
i
id id id id id
id
id
id
Figure 2: Reduction to find best energy and corresponding thread in a single block
Then minimum energy Emin is compared with reference energy Erwhich was computed
from the input sequence SI. If Emin <E
rthen Emin value becomes a new reference energy
and sequence for this energy is used as a new input sequence SI. The thread who computed
Emin energy copies C(S)table from his cache to shared memory. Additionally, the same thread
updates T(S)structure, changing cells, which contain elements with index equals to thread id.
7
884 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
Cotta proposed to run SDLS optimization until no further improvement is achieved [9],
however this concept in parallel version could cause synchronization overhead (when one block
waits for other that still seeks better sequence performing more iterations). This is why, the
constant value of iterations has been chosen to make all block execute in the same (or very
similar) time and then minimize waiting time before synchronization. After Niterations, the
computed achieved results (energies and sequences) are copied to global memory and then to
the CPU where they can be further processed by evolutionary meta-heuristics.
During this processes memory management is crucial. For the sequence of length L, the
following structures have to be allocated in the shared memory by each block:
•L∗sizeof (short)– sequence representation,
•(L−1)∗L
2∗sizeof (short)–T(S)table
•(L−1) ∗sizeof (int)–C(S)table,
•2∗L∗sizeof (int)– energies after flipping with ids of corresponding threads.
5 Experimental Results
To show the effectiveness of the presented SDLS algorithm for LABS realized on GPU, both
GPU and CPU implementation have been considered. On CPU, sequential and parallel version
of algorithm (for 2 & 4 cores) have been run. The experiments for GPU were performed on
NVIDIA Tesla m2090 and for CPU on Intel(R) Xeon(R) CPU E5-2630 v3(2.4GHz).
The reference version of SDLS for LABS for CPU has been implemented using OpenMP6
library. It is a concurrency platform for multi-threaded, shared-memory and parallel processing
for C, C++ and Fortran languages. It’s a natural choice to build such algorithms, because
OpenMP allows to easily transform sequential code to a parallel one using appropriate compiler
directives to generate threads for the parallel processor platform (there is no need to create
threads nor assign tasks to each of them manually). In the experiments a special compiler
directive -O3 was used to optimize multi-core CPU implementation.
Both versions have been run for LABS sequences of lengths L= 48 and L= 201, comparing
times of execution for different number of solutions (1000, 4000, 8000, 16000, 32000, 64000).
Number of SDLS iterations has been set to 128. The results were gathered from 10 executions
of each configuration – the average values has been presented. The observed standard deviation
of the average results was negligible.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 22.61 698.8 30.90 348.2 15.40 239.2 10.58
4000 73.93 2663.8 36.03 1398.5 18.92 1120.9 15.16
8000 144.39 5341.6 36.99 2791.8 19.33 2281 15.80
16000 285.74 10683.6 37.39 5600.5 19.60 5030.9 17.61
32000 568.99 21380.4 37.58 11795.9 20.73 6605.2 11.61
64000 1129.87 42803.3 37.88 23574.8 20.87 13077.1 11.57
Table 1: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 48
6http://www.openmp.org/
8
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 916.35 11645.4 12.71 5756.9 6.28 3016.8 3.29
4000 3314.44 46524.3 14.04 23114.4 6.97 12059.2 3.64
8000 6869.51 93227.1 13.57 46574.2 6.78 24575.4 3.58
16000 12583.08 185832.4 14.77 93483.8 7.43 50674.3 4.03
32000 26573.16 376872.9 14.18 197617.3 7.44 112675.5 4.24
64000 53760.38 742100.9 13.80 396481.4 7.37 230078.9 4.28
Table 2: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 201
Results shown in the tables 1 and 2 present average times in milliseconds for particular
configurations together with speed-up value, computed as relation between time on CPU and
GPU version (Sp=TCPU
TGP U ). For L= 48 sequences (table 1), GPU algorithm is significantly
faster than sequential version on CPU (∼30 −37) and still gives very good speed-up even for 4
cores on CPU (∼10 −17) regardless of number of solutions. For L= 201 sequences (table 2),
GPU executes around ∼13 faster than sequential version and around ∼3.2−4.2faster than 4
cores CPU version. Some decrease of speed-up is caused by a fact, that the proposed algorithm
is optimized for L∼200 sequences and than it utilizes most of GPU logical resources. In such
cases, the physical GPU resources have to be shared between particular blocks and threads,
and so the efficiency is lower. In all cases parallel version of SDLS on GPU gives significant
efficiency improvement and shows that presented implementation can be used in fast finding
better LABS sequences and narrows domain for further process of LABS search using other
techniques such as evolutionary algorithms.
6 Conclusions and further work
This paper is one of the first steps toward efficient hybrid platform dedicated for solving difficult
discrete problems such as LABS or Golomb-ruler optimization, that utilize both evolutionary
multi-agent systems as well as local optimization techniques implemented on GPU. The pre-
sented implementation of LABS optimization shows that a significant speed-up can be achieved
using parallel GPU computations. The implementation can be further integrated with meta-
heuristics such as evolutionary algorithms, which constitute a basis for the concept of hybrid
computational environment in master-slave model that seems to be very promising.
In the near future the authors plan to fully integrate AgE platform with local optimiza-
tion algorithm implemented on GPU. Important direction of the research is to adjust both
algorithms to lower communication overhead between CPU and GPU which includes required
data conversions. Also, the plan assumes the implementation of the hybrid concept for other
difficult problems that can be solved using algorithms with parallel structure – some research
for optimal Golomb ruler search has been already published in [17, 16].
6.1 Acknowledgments
The research presented in the paper received support from AGH University of Science and Tech-
nology statutory project and by the Faculty of Computer Science, Electronics and Telecommu-
nications Dean’s Grant for Ph.D. Students and Young Researchers.
9
Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886 885
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
Cotta proposed to run SDLS optimization until no further improvement is achieved [9],
however this concept in parallel version could cause synchronization overhead (when one block
waits for other that still seeks better sequence performing more iterations). This is why, the
constant value of iterations has been chosen to make all block execute in the same (or very
similar) time and then minimize waiting time before synchronization. After Niterations, the
computed achieved results (energies and sequences) are copied to global memory and then to
the CPU where they can be further processed by evolutionary meta-heuristics.
During this processes memory management is crucial. For the sequence of length L, the
following structures have to be allocated in the shared memory by each block:
•L∗sizeof (short)– sequence representation,
•(L−1)∗L
2∗sizeof (short)–T(S)table
•(L−1) ∗sizeof (int)–C(S)table,
•2∗L∗sizeof (int)– energies after flipping with ids of corresponding threads.
5 Experimental Results
To show the effectiveness of the presented SDLS algorithm for LABS realized on GPU, both
GPU and CPU implementation have been considered. On CPU, sequential and parallel version
of algorithm (for 2 & 4 cores) have been run. The experiments for GPU were performed on
NVIDIA Tesla m2090 and for CPU on Intel(R) Xeon(R) CPU E5-2630 v3(2.4GHz).
The reference version of SDLS for LABS for CPU has been implemented using OpenMP6
library. It is a concurrency platform for multi-threaded, shared-memory and parallel processing
for C, C++ and Fortran languages. It’s a natural choice to build such algorithms, because
OpenMP allows to easily transform sequential code to a parallel one using appropriate compiler
directives to generate threads for the parallel processor platform (there is no need to create
threads nor assign tasks to each of them manually). In the experiments a special compiler
directive -O3 was used to optimize multi-core CPU implementation.
Both versions have been run for LABS sequences of lengths L= 48 and L= 201, comparing
times of execution for different number of solutions (1000, 4000, 8000, 16000, 32000, 64000).
Number of SDLS iterations has been set to 128. The results were gathered from 10 executions
of each configuration – the average values has been presented. The observed standard deviation
of the average results was negligible.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 22.61 698.8 30.90 348.2 15.40 239.2 10.58
4000 73.93 2663.8 36.03 1398.5 18.92 1120.9 15.16
8000 144.39 5341.6 36.99 2791.8 19.33 2281 15.80
16000 285.74 10683.6 37.39 5600.5 19.60 5030.9 17.61
32000 568.99 21380.4 37.58 11795.9 20.73 6605.2 11.61
64000 1129.87 42803.3 37.88 23574.8 20.87 13077.1 11.57
Table 1: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 48
6http://www.openmp.org/
8
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
No. of
solutions GPGPU CPU
(1 core) Speed-up CPU
(2 cores) Speed-up CPU
(4 cores) Speed-up
1000 916.35 11645.4 12.71 5756.9 6.28 3016.8 3.29
4000 3314.44 46524.3 14.04 23114.4 6.97 12059.2 3.64
8000 6869.51 93227.1 13.57 46574.2 6.78 24575.4 3.58
16000 12583.08 185832.4 14.77 93483.8 7.43 50674.3 4.03
32000 26573.16 376872.9 14.18 197617.3 7.44 112675.5 4.24
64000 53760.38 742100.9 13.80 396481.4 7.37 230078.9 4.28
Table 2: Execution times in milliseconds for SDLS with 128 iterations for LABS L= 201
Results shown in the tables 1 and 2 present average times in milliseconds for particular
configurations together with speed-up value, computed as relation between time on CPU and
GPU version (Sp=TCPU
TGP U ). For L= 48 sequences (table 1), GPU algorithm is significantly
faster than sequential version on CPU (∼30 −37) and still gives very good speed-up even for 4
cores on CPU (∼10 −17) regardless of number of solutions. For L= 201 sequences (table 2),
GPU executes around ∼13 faster than sequential version and around ∼3.2−4.2faster than 4
cores CPU version. Some decrease of speed-up is caused by a fact, that the proposed algorithm
is optimized for L∼200 sequences and than it utilizes most of GPU logical resources. In such
cases, the physical GPU resources have to be shared between particular blocks and threads,
and so the efficiency is lower. In all cases parallel version of SDLS on GPU gives significant
efficiency improvement and shows that presented implementation can be used in fast finding
better LABS sequences and narrows domain for further process of LABS search using other
techniques such as evolutionary algorithms.
6 Conclusions and further work
This paper is one of the first steps toward efficient hybrid platform dedicated for solving difficult
discrete problems such as LABS or Golomb-ruler optimization, that utilize both evolutionary
multi-agent systems as well as local optimization techniques implemented on GPU. The pre-
sented implementation of LABS optimization shows that a significant speed-up can be achieved
using parallel GPU computations. The implementation can be further integrated with meta-
heuristics such as evolutionary algorithms, which constitute a basis for the concept of hybrid
computational environment in master-slave model that seems to be very promising.
In the near future the authors plan to fully integrate AgE platform with local optimiza-
tion algorithm implemented on GPU. Important direction of the research is to adjust both
algorithms to lower communication overhead between CPU and GPU which includes required
data conversions. Also, the plan assumes the implementation of the hybrid concept for other
difficult problems that can be solved using algorithms with parallel structure – some research
for optimal Golomb ruler search has been already published in [17, 16].
6.1 Acknowledgments
The research presented in the paper received support from AGH University of Science and Tech-
nology statutory project and by the Faculty of Computer Science, Electronics and Telecommu-
nications Dean’s Grant for Ph.D. Students and Young Researchers.
9
886 Dominik Żurek et al. / Procedia Computer Science 108C (2017) 877–886
Toward hybrid platform for evolutionary computations of hard discrete problems D. Żurek et al.
References
[1] E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on
Evolutionary Computation, 6(5):443–462, Oct 2002.
[2] Michael Bartholomew-Biggs. The Steepest Descent Method, pages 1–8. Springer US, Boston, MA,
2008.
[3] B. Bošković, F. Brglez, and J. Brest. A GitHub Archive for Solvers and Solutions of the labs
problem. For updates, see https://github.com/borkob/git_labs. , January 2016.
[4] Aleksander Byrski and Marek Kisiel-Dorohinicki. Agent-based model and computing environment
facilitating the development of distributed computational intelligence systems. In Proceedings of
the 9th International Conference on Computational Science, ICCS 2009, pages 865–874, Berlin,
Heidelberg, 2009. Springer-Verlag.
[5] Erick Cantu-Paz. Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publish-
ers, Norwell, MA, USA, 2000.
[6] K. Cetnarowicz, M. Kisiel-Dorohinicki, and E. Nawarecki. The application of evolution process
in multi-agent world (MAW) to the prediction system. In Proc. of 2nd Int. Conf. on Multi-Agent
Systems (ICMAS’96). AAAI Press, 1996.
[7] C. deGroot, K.H. Hoffmann, and D. Würtz. Low Autocorrelation Binary Sequences: Exact Enu-
meration and Optimization by Evolution Strategies Claas DeGroot; Diethelm Würtz; Karl Heinz
Hoffmann. IPS research report. Eidgenössische Technische Hochschule Zürich, Interdisziplinäres
Projektzentrum für Supercomputing, 1989.
[8] Iván Dotú and Pascal Van Hentenryck. A Note on Low Autocorrelation Binary Sequences, pages
685–689. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.
[9] José E. Gallardo, Carlos Cotta, and Antonio J. Fernández. Finding low autocorrelation binary
sequences with memetic algorithms. Appl. Soft Comput., 9(4):1252–1262, September 2009.
[10] M. Golay. The merit factor of long low autocorrelation binary sequences (corresp.). IEEE Trans-
actions on Information Theory, 28(3):543–549, May 1982.
[11] Steven Halim, Roland H. C. Yap, and Felix Halim. Engineering Stochastic Local Search for the
Low Autocorrelation Binary Sequence Problem, pages 640–645. Springer Berlin Heidelberg, Berlin,
Heidelberg, 2008.
[12] Magdalena Kolybacz, Michal Kowol, Lukasz Lesniak, Aleksander Byrski, and Marek Kisiel-
Dorohinicki. Efficiency of memetic and evolutionary computing in combinatorial optimisation.
In Webjørn Rekdalsbakken, Robin T. Bye, and Houxiang Zhang, editors, ECMS, pages 525–531.
European Council for Modeling and Simulation, 2013.
[13] Michał Kowol, Aleksander Byrski, and Marek Kisiel-Dorohinicki. Agent-based evolutionary com-
puting for difficult discrete problems. Procedia Computer Science, 29:1039 – 1047, 2014.
[14] Tom Packebusch and Stephan Mertens. Low autocorrelation binary sequences. Journal of Physics
A: Mathematical and Theoretical, 49(16):165001, 2016.
[15] Kamil Piętak, Adam Woś, Aleksander Byrski, and Marek Kisiel-Dorohinicki. Functional Integrity
of Multi-agent Computational System Supported by Component-Based Implementation, pages 82–
91. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[16] M. Pietron, A. Byrski, and M. Kisiel-Dorohinicki. Leveraging heterogeneous parallel platform
in solving hard discrete optimization problems with metaheuristics. Journal of Computational
Science, 18:59 – 68, 2017.
[17] Marcin Pietron, Aleksander Byrski, and Marek Kisiel-Dorohinicki. Gpgpu for difficult black-box
problems. Procedia Computer Science, 51:1023 – 1032, 2015.
[18] Toby Walsh. CSPLib problem 005: Low autocorrelation binary sequences. http://www.csplib.
org/Problems/prob005. Accessed: 2017-01-31.
10