ArticlePDF Available

Understanding the Relationship between Scheduling Problem Structure and Heuristic Performance using Knowledge Discovery, LNCS

Authors:

Abstract and Figures

Using a knowledge discovery approach, we seek insights into the relationships between problem structure and the effectiveness of scheduling heuristics. A large collection of 75,000 instances of the single machine early/tardy scheduling problem is generated, characterized by six features, and used to explore the performance of two common scheduling heuristics. The best heuristic is selected using rules from a decision tree with accuracy exceeding 97%. A self-organizing map is used to visualize the feature space and generate insights into heuristic performance. This paper argues for such a knowledge discovery approach to be applied to other optimization problems, to contribute to automation of algorithm selection as well as insightful algorithm design.
Content may be subject to copyright.
Understanding the Relationship between Scheduling
Problem Structure and Heuristic Performance using
Knowledge Discovery
Kate A. Smith-Miles
1
, Ross J. W. James
2
, John W. Giffin
2
and Yiqing Tu
1
1
School of Engineering and IT, Deakin University, Burwood VIC 3125, Australia
{katesm, ytu}
@deakin.edu.au
2
Department of Management, University of Canterbury, Christchurch 8140, New Zealand
{ross.james, john.giffin}@canterbury.ac.nz
Abstract. Using a knowledge discovery approach, we seek insights into the
relationships between problem structure and the effectiveness of scheduling
heuristics. A large collection of 75,000 instances of the single machine
early/tardy scheduling problem is generated, characterized by six features, and
used to explore the performance of two common scheduling heuristics. The best
heuristic is selected using rules from a decision tree with accuracy exceeding
97%. A self-organizing map is used to visualize the feature space and generate
insights into heuristic performance. This paper argues for such a knowledge
discovery approach to be applied to other optimization problems, to contribute
to automation of algorithm selection as well as insightful algorithm design.
Keywords: Scheduling; heuristics; algorithm selection; self-organizing map;
performance prediction; knowledge discovery
1 Introduction
It has long been appreciated that knowledge of a problem’s structure and instance
characteristics can assist in the selection of the most suitable algorithm or heuristic [1,
2]. The No Free Lunch theorem [3] warns us against expecting a single algorithm to
perform well on all classes of problems, regardless of their structure and
characteristics. Instead we are likely to achieve better results, on average, across many
different classes of problem, if we tailor the selection of an algorithm to the
characteristics of the problem instance. This approach has been well illustrated by the
recent success of the algorithm portfolio approach on the 2007 SAT competition [4].
As early as 1976, Rice [1] proposed a framework for the algorithm selection
problem. There are four essential components of the model:
the problem space P represents the set of instances of a problem class;
the feature space F contains measurable characteristics of the instances
generated by a computational feature extraction process applied to P;
the algorithm space A is the set of all considered algorithms for tackling the
problem;
the performance space Y represents the mapping of each algorithm to a set of
performance metrics.
In addition, we need to find a mechanism for generating the mapping from feature
space to algorithm space. The Algorithm Selection Problem can be formally stated as:
For a given problem instance x P, with features f(x) F, find the selection mapping
S(f(x)) into algorithm space A, such that the selected algorithm α A maximizes the
performance mapping y(α(x)) Y. The collection of data describing {P, A, Y, F} is
known as the meta-data.
There have been many studies in the broad area of algorithm performance
prediction, which is strongly related to algorithm selection in the sense that supervised
learning or regression models are used to predict the performance ranking of a set of
algorithms, given a set of features of the instances. In the AI community, most of the
relevant studies have focused on constraint satisfaction problems like SAT, QBF or
QWH (P, in Rice’s notation), using solvers like DPLL, CPLEX or heuristics (A), and
building a regression model (S) to use the features of the problem structure (F) to
predict the run-time performance of the algorithms (Y). Studies of this nature include
Leyton-Brown and co-authors [5-7], and the earlier work of Horvitz [8] that used a
Bayesian approach to learn the mapping S. In recent years these studies have extended
into the algorithm portfolio approach [4] and a focus on dynamic selection of
algorithm components in real-time [9, 10].
In the machine learning community, research in the field of meta-learning has
focused on classification problems (P), solved using typical machine learning
classifiers such as decision trees, neural networks, or support vector machines (A),
where supervised learning methods (S) have been used to learn the relationship
between the statistical and information theoretic measures of the classification
instance (F) and the classification accuracy (Y). The term meta-learning [11] is used
since the aim is to learn about learning algorithm performance. Studies of this nature
include [12-14] to name only three of the many papers published over the last 15
years.
In the operations research community, particularly in the area of constrained
optimization, researchers appear to have made fewer developments, despite recent
calls for developing greater insights into algorithm performance by studying search
space or problem instance characteristics. According to Stützle and Fernandes [15],
“currently there is still a strong lack of understanding of how exactly the relative
performance of different meta-heuristics depends on instance characteristics”.
Within the scheduling community, some researchers have been influenced by the
directions set by the AI community when solving constraint satisfaction problems.
The dynamic selection of scheduling algorithms based on simple low-level
knowledge, such as the rate of improvement of an algorithm at the time of dynamic
selection, has been applied successfully [16]. Other earlier approaches have focused
on integrating multiple heuristics to boost scheduling performance in flexible
manufacturing systems [17].
For many NP-hard optimization problems, such as scheduling, there is a great deal
we can discover about problem structure which could be used to create a rich set of
features. Landscape analysis (see [18-20]) is one framework for measuring the
characteristics of problems and instances, and there have been many relevant
developments in this direction, but the dependence of algorithm performance on these
measures is yet to be completely determined [20].
Clearly, Rice’s framework is applicable to a wide variety of problem domains. A
recent survey paper [21] has discussed the developments in algorithm selection across
a variety of disciplines, using Rice’s notation as a unifying framework, through which
ideas for cross-fertilization can be explored. Beyond the goal of performance
prediction also lies the ideal of greater insight into algorithm performance, and very
few studies have focused on methodologies for acquiring such insights. Instead the
focus has been on selecting the best algorithm for a given instance, without
consideration for what implications this has for algorithm design or insight into
algorithm behaviour. This paper demonstrates that knowledge discovery processes
can be applied to a rich set of meta-data to develop, not just performance predictions,
but visual explorations of the meta-data and learned rules, with the goal of learning
more about the dependencies of algorithm performance on problem structure and data
characteristics.
In this paper we present a methodology encompassing both supervised and
unsupervised knowledge discovery processes on a large collection of meta-data to
explore the problem structure and its impact on algorithm suitability. The problem
considered is the early/tardy scheduling problem, described in section 2. The
methodology and meta-data is described in section 3, comprising 75,000 instances (P)
across a set of 6 features (F). We compare the performance of two common heuristics
(A), and measure which heuristic produces the lowest cost solution (Y). The mapping
S is learned from the meta-data {P, A, Y, F} using knowledge derived from self-
organizing maps, and compared to the knowledge generated and accuracy of the
performance predictions using the supervised learning methods of neural networks
and decision trees. Section 4 presents the results of this methodology, including
decision tree rules and visualizations of the feature space, and conclusions are drawn
in Section 5.
2 The Early/Tardy Machine Scheduling Problem
Research into the various types of E/T scheduling problems was motivated, in part, by
the introduction of Just-in-Time production, which required delivery of goods to be
made at the time required. Both early and late production are discouraged, as early
production incurs holding costs, and late delivery means a loss of customer goodwill.
A summary of the various E/T problems was presented in [22] which showed the NP-
completeness of the problem.
2.1 Formulation
The E/T scheduling problem we consider is the single machine, distinct due date,
early/tardy scheduling problem where each job has an earliness and tardiness penalty
and due date. The objective is to minimise the total penalty produced by the schedule.
The objective of this problem can be defined as follows:
(
)
=
+
+
+
n
i
i
d
i
c
ii
c
i
d
i
1
min
βα
.
(1)
where n is the number of jobs to be scheduled, c
i
is the completion time of job i, d
i
is
the due date of job i,
α
i
is the penalty per unit of time when job i is produced early,
β
i
is the penalty per unit of time when job i is produced tardily, and |x|
+
= x if x > 0,
or 0 otherwise. We also define p
i
as the processing time of job i.
The objective of this problem is to schedule the jobs as closely as possible to their
due dates; however the difficulty in formulating a schedule occurs when it is not
possible to schedule all jobs on their due dates, which also causes difficulties in
managing the many tradeoffs between jobs competing for processing at a given time
[23]. Two of the simplest and most commonly used dispatching heuristics for the E/T
scheduling problem are the Earliest Due Date and Shortest Processing Time
heuristics.
2.2 Earliest Due Date (EDD) heuristic
The EDD heuristic orders the jobs based on the date the job is due to be delivered to
the customer. The jobs with the earliest due date are scheduled first, while the jobs
with the latest due date are scheduled last. After the sequence is determined, the
completion times of each job are then calculated using the optimal idle time insertion
algorithm of Fry, Armstrong and Blackstone [24]. For single machine problems the
EDD is known to be the best rule to minimise the maximum lateness, and therefore
tardiness, and also the lateness variance [25]. The EDD has the potential to produce
optimal solutions to this problem, for example when there are few jobs and the due
dates are widely spread so that all jobs may be scheduled on their due date without
interfering with any other jobs. As there are no earliness or tardiness penalties, the
objective value will be 0 and therefore optimal.
2.3 Shortest Processing Time (SPT) heuristic
The SPT heuristic orders the jobs based on their processing time. The jobs with the
smallest processing time are scheduled first, while the jobs with the largest processing
time are scheduled last; this is the fastest way to get most of the jobs completed
quickly. Once the SPT sequence has been determined, the job completion times are
calculated using the optimal idle time insertion algorithm [24]. The SPT heuristic has
been referred to as “the world champion” scheduling heuristic [26], as it produces
schedules for single machine problems that are good at minimising the average time
of jobs in a system, minimising the average number of jobs in the system and
minimising the average job lateness [25]. When the tardiness penalties for the jobs are
similar and the due dates are such that the majority of jobs are going to be late, SPT is
likely to produce a very good schedule for the E/T scheduling problem, as it gets the
jobs completed as quickly as possible. The “weighted” version of the SPT heuristic,
where the order is determined by p
i
/β
i
,
is used in part by many E/T heuristics, as this
order can be proven to be optimal for parts of a given schedule.
2.4 Discussion
Due to the myopic nature of the EDD and SPT heuristics, neither heuristic is going to
consistently produce high quality solutions to the general E/T scheduling problem.
Both of these simple heuristics generate solutions very quickly however and therefore
it is possible to carry out a large sample of problems in order to demonstrate whether
or not the approach proposed here is useful for exploring the relative performance of
two heuristics (or algorithms) and is able to predict the superiority of one heuristic
over another for a given instance.
3 Methodology
In this section we describe the meta-data for the E/T scheduling problem in the form
of {P, A, Y, F}. We also provide a description of the machine learning algorithms
applied to the meta-data to produce rules and visualizations of the meta-data.
3.1 Meta-Data for the E/T Scheduling Problem
The most critical part of the proposed methodology is identification of suitable
features of the problem instances that reflect the structure of the problem and the
characteristics of the instances that might explain algorithm performance. Generally
there are two main approaches to characterizing the instances: the first is to identify
problem dependent features based on domain knowledge of what makes a particular
instance challenging or easy to solve; the second is a more general set of features
derived from landscape analysis [27]. Related to the latter is the approach known in
the meta-learning community as landmarking [28], whereby an instance is
characterized by the performance of simple algorithms which serve as a proxy for
more complicated (and computationally expensive) features. Often a dual approach
makes sense, particularly if the feature set derived from problem dependent domain
knowledge is not rich, and supplementation from landscape analysis can assist in the
characterization of the instances. In the case of the generalised single machine E/T
scheduling problem however, there is sufficient differentiation power in a small
collection of problem dependent features that we can derive rules explaining the
different performance of the two common heuristics. Extending this work to include a
greater set of algorithms (A) may justify the need to explore landscape analysis tools
to derive greater characterisation of the instances.
In this paper, each n-job instance of the generalised single machine E/T scheduling
problem has been characterized by the following features.
1. Number of jobs to be scheduled in the instance, n
2. Mean Processing Time
p : The mean processing time of all jobs in an instance
3. Processing Time Range p
σ
: The range of the processing times of all jobs in the
instance
4. Tardiness Factor
τ
: Defines where the average due date occurs relative to, and
as a fraction of the total processing time of all jobs in the instance. A positive
tardiness factor indicates the proportion of the schedule that is expected to be
tardy, while a negative tardiness factor indicates the amount of idle time that is
expected in the schedule as a proportion of the total processing time of all jobs
in the sequence. Mathematically the tardiness factor was defined by Baker and
Martin [29] as:
=
i
i
pn
d
1
τ
.
5. Due Date Range D
σ
: Determines the spread of the due dates from the average
due date for all jobs in the instance. It is defined as
=
i
p
ab
D
)(
σ
, where b is
the maximum due date in the instance and a is the minimum due date in the
instance.
6. Penalty Ratio
ρ
: The maximum over all jobs in the instance of the ratio of the
tardy penalty to the early penalty.
Any instance of the problem, whether contained in the meta-data set or generated
at a future time, can be characterized by this set of six features. Since we are
comparing the performance of only two heuristics, we can create a single binary
variable to indicate which heuristic performs best for a given problem instance. Let
Y
i
=1 if EDD is the best performing heuristic (lowest objective function) compared to
SPT for problem instance i, and Y
i
=0 otherwise (SPT is best). The meta-data then
comprises the set of six-feature vectors and heuristic performance measure (Y), for a
large number of instances, and the task is to learn the relationship between features
and heuristic performance.
In order to provide a large and representative sample of instances for the meta-data,
an instance generator was created to span a range of values for each feature. Problem
instances were then generated for all combinations of parameter values. The
parameter settings used were:
problem size (number of jobs, n): 20-100 with increments of 20 (5 levels)
processing times p
i
: processing times randomly generated within ranges of
2-10 with increments of 2 (5 levels)
processing time means
p : calculated from randomly generated p
i
processing time range p
σ
: calculated from randomly generated p
i
due dates d
i
: due dates randomly generated within ranges of 0.2-1 with
increments of 0.2 (5 levels)
due date range D
σ
: calculated from randomly generated due dates d
i
tardiness factor
τ
: calculated based on randomly generated p
i
and d
i
penalty ratio
ρ
: 1-10 with increments of 1 (10 levels)
Ten instances using each parameter setting were then generated, giving a total of 5
(size levels) x 5 (processing time range levels) x 6 (tardiness factor levels) x 5 (due
date range levels) x 10 (penalty ratio levels) x 10 (instances) = 75,000 instances.
A correlation analysis between the instance features and the Y values across all
75,000 instances reveals that the only instance features that appear to correlate
(linearly) with heuristic performance are the tardiness factor (correlation = -0.59) and
due date range (correlation = 0.44). None of the other instance features appear to have
a linear relationship with algorithm performance. Clearly due date range and tardiness
factor correlate somewhat with the heuristic performances, but it is not clear if these
are non-linear relationships, and if either of these features with combinations of the
others can be used to seek greater insights into the conditions under which one
heuristic is expected to outperform the other.
Using Rice’s notation, our meta-data can thus be described as:
P = 75,000 E/T scheduling instances
A = 2 heuristics (EDD and SPT)
Y = binary decision variable indicating if EDD is best compared to SPT
(based on objective function which minimizes weighted deviation from due
dates)
F = 6 instance features (problem size, processing time mean, processing time
range, due date range, tardiness factor and penalty ratio).
Additional features could undoubtedly be derived either from problem dependent
domain knowledge, or using problem independent approaches such as landscape
analysis [28], landmarking [28], or hyper-heuristics [30]. For now though, we seek to
learn the relationships that might exist in this meta-data.
3.2 Knowledge Discovery on the Meta-Data
When exploring any data-set to discover knowledge, there are two broad approaches.
The first is supervised learning (aka directed knowledge discovery) which uses
training examples – sets of independent variables (inputs) and dependent variables
(outputs) - to learn a predictive model which is then generalized for new examples to
predict the dependent variable (output) based only on the independent variables
(inputs). This approach is useful for building models to predict which algorithm or
heuristic is likely to perform best given only the feature vector as inputs. The second
broad approach to knowledge discovery is unsupervised learning (aka undirected
knowledge discovery) which uses only the independent variables to find similarities
and differences between the structure of the examples, from which we may then be
able to infer relationships between these structures and the dependent variables. This
second approach is useful for our goal of seeking greater insight into why certain
heuristics might be better suited to certain instances and, rather than just building
predictive models of heuristic performance.
In this section we briefly summarise the machine learning methods we have used
for knowledge discovery on the meta-data.
Neural Networks.
As a supervised learning method [31], neural networks can be used to learn to predict
which heuristic is likely to return the smallest objective function value. A training
dataset is randomly extracted (80% of the 75,000 instances) and used to build a non-
linear model of the relationships between the input set (features F) and the output
(metric Y). Once the model has been learned, its generalisation on an unseen test set
(the remaining 20% of the instances) is evaluated and recorded as a percentage
accuracy in predicting the superior heuristic. This process is repeated ten times for
different random extractions of the training and test sets, to ensure that the results
were not simply an artifact of the random number seed. This process is known as ten-
fold cross validation, and the reported results show the average accuracy on the test
set across these ten folds.
For our experimental results, the neural network implementation within the Weka
machine learning platform [32] was used with 6 input nodes, 4 hidden nodes, and 2
output nodes utilising binary encoding. The transfer function for the hidden nodes was
a sigmoidal function, and the neural network was trained with the backpropagation
(BP) learning algorithm with learning rate = 0.3, momentum = 0.2. The BP algorithm
stops when the number of epochs (complete presentation of all examples) reaches a
maximum training time of 500 epochs or the error on the test set does not decrease
after a threshold of 20 epochs.
Decision Tree
A decision tree [33] is also a supervised learning method, which uses the training data
to successively partition the data, based on one feature at a time, into classes. The
goal is to find features on which to split the data so that the class membership at lower
leaves of the resulting tree is as “pure” as possible. In other words, we strive for
leaves that are comprised almost entirely of one class only. The rules describing each
class can then be read up the tree by noting the features and their splitting points. Ten-
fold cross validation is also used in our experiments to ensure the generalisation of the
rules.
The J4.8 decision tree algorithm, implemented in Weka [32], was used for our
experimental results, with a minimum leaf size of 500 instances. The generated
decision tree is pruned using subtree raising with confidence factor = 0.25.
Self-Organizing Maps
Self-Organizing Maps (SOMs) are the most well-known unsupervised neural network
approach to clustering. Their advantage over traditional clustering techniques such as
the k-means algorithm lies in the improved visualization capabilities resulting from
the two-dimensional map of the clusters. Often patterns in a high dimensional input
space have a very complicated structure, but this structure is made more transparent
and simple when they are clustered in a lower dimensional feature space. Kohonen
[34] developed SOMs as a way of automatically detecting strong features in large data
sets. SOMs find a mapping from the high dimensional input space to low dimensional
feature space, so any clusters that form become visible in this reduced dimensionality.
The architecture of the SOM is an multi-dimensional input vector connected via
weights to a 2-dimensional array of neurons. When an input pattern is presented to the
SOM, each neuron calculates how similar the input is to its weights. The neuron
whose weights are most similar (minimal distance in input space) is declared the
winner of the competition for the input pattern, and the weights of the winning
neuron, and its neighbours, are strengthened to reflect the outcome. The final set of
weights embeds the location of cluster centres, and is used to recognize to which
cluster a new input vector is closest.
For our experiments we randomly split the 75000 instances into training data
(50000 instances) and test data (25000 instances). We use the Viscovery SOMine
software (www.eudaptics.com) to cluster the instances based only on the six features
as inputs. A map of 2000 nodes is trained for 41 cycles, with the neighbourhood size
diminishing linearly at each cycle. After the clustering of the training instances, the
distribution of Y values is examined within each cluster, and knowledge about the
relationships between instance structure and heuristic performance is inferred and
evaluated on the test data.
4 Experimental Evaluation
4.1 Supervised Learning Results
Both the neural network and decision tree algorithms were able to learn the
relationships in the meta-data, achieving greater than 97% accuracy (on ten-fold
cross-validation test sets) in predicting which of the two heuristics would be superior
based only on the six features (inputs). These approaches have an overall
classification accuracy of 97.34% for the neural network and 97.13% for the decision
tree. While the neural network can be expected to learn the relationships in the data
more powerfully, due to its nonlinearity, its limitation is the lack of insight and
explanation of those relationships. The decision tree’s advantage is that it produces a
clear set of rules, which can be explored to see if any insights can be gleaned. The
decision tree rules are presented in the form of pseudo-code in Figure 3, with the
fraction in brackets showing the number of instances that satisfied both the condition
and the consequence (decision) in the numerator, divided by the total number of
instances that satisfied the condition in the denominator. This proportion is equivalent
to the accuracy of the individual rule.
The results allow us to state a few rules with exceptionally high accuracy:
1) If the majority of jobs are expected to be scheduled early (tardiness factor <=
0.5) then EDD is best in 99.8% of instances
2) If the majority of the jobs are expected to be scheduled late (tardiness factor
> 0.7) then SPT is best in 99.5% of instances
3) If slightly more than half of the jobs are expected to be late (tardiness factor
between 0.5 and 0.7) then as long as the tardiness penalty ratio is no more
than 3 times larger than the earliness penalty (
ρ
3), then EDD is best in
98.9% of the instances with a due date range greater than 0.2.
The first two rules are intuitive and can be justified from what we know about the
heuristics - EDD is able to minimise lateness deviations when the majority of jobs can
be scheduled before their due date, and SPT is able to minimise the time of jobs in the
system and hence tardiness when the majority of jobs are going to be late [25]. The
third rule reveals the kind of knowledge that can be discovered by adopting a machine
learning approach to the meta-data. Of course other rules can also be explored from
Figure 3, with less confidence due to the lower accuracy, but they may still provide
the basis for gaining insight into the conditions under which different algorithms can
be expected to perform well.
If (
τ
<= 0.7) Then
If (
τ
<= 0.5) Then EDD best (44889/45000 = 99.8%)
If (
τ
> 0.5) Then If (D
σ
<= 0.2) Then If (
ρ
<= 3) Then EDD best (615/750 = 82.0%)
Else SPT best (1483/1750 = 84.7%)
Else If (
ρ
<= 3) Then EDD best (5190/5250 = 98.9%)
Else If (
τ
<= 0.6) Then EDD best (8320/8750 = 95.1%)
Else If (p <= 2) Then EDD best (556/700 = 79.4%)
Else If (n <= 60) Then SPT best (1150/1680 = 68.4%)
Else EDD best (728/1120 = 65%)
Else SPT best (9950/10000 = 99.5%)
Fig. 3. Pseudo-code for the decision tree rule system, showing the accuracy of each rule
4.2 Unsupervised Learning Results
After training the SOM, the converged map shows 5 clusters, each of which contains
similar instances defined by Euclidean distance in feature space. Essentially, the six-
dimensional input vectors have been projected onto a two-dimensional plane, with
topology-preserving properties. The clusters can be inspected to understand what the
instances within each cluster have in common. The statistical properties of the 5
clusters can be seen in Table 1. The distribution of the input variables (features), and
additional variables including the performance of the heuristics, can be visually
explored using the maps shown in Figure 4. A k-nearest neighbour algorithm (with
k=7) is used to distribute additional data instances (from the test set) or extra variables
(Y values) across the map.
Looking first at the bottom row of Table 1, it is clear that clusters 1, 2 and 3
contain instances that are best solved using EDD (Y=1). These clusters are shown
visually in the bottom half of the 2-d self-organizing map (see Figure 4a for cluster
boundaries, and Figure 4b to see the distribution of Y across the clusters). These three
clusters of instances account for 70.2% of the 75,000 instances (see Table 1). The
remaining clusters 4 and 5 are best solved, on average, by SPT. The maps shown in
Figure 4c – 4h enable us to develop a quick visual understanding of how the clusters
differ from each other, and to see which features are prominent in defining instance
structure.
Table 1. Cluster statistics - mean values of input variables, and heuristic performance variable
Y, as well as cluster size. The first number in each cell is the value for the training data, and the
second number in parenthesis is for the test data.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 All Data
instances
17117
(8483)
10454
(5236)
7428
(3832)
8100
(4000)
6901
(3449)
50000
(25000)
instances
(%)
34.23
(33.93)
20.91
(20.94)
14.86
(15.33)
16.2
(16.0)
13.8
(13.8)
100
(100)
n
60.65
(61.03)
59.73
(59.73)
58.73
(58.96)
57.8
(57.7)
63.39
(61.56)
60.0
(59.97)
2.77 (2.76) 5.24 (5.22) 5.08 (5.07) 5.12 (5.11) 2.70 (2.71) 4.0 (3.99)
p
σ
3.54 (3.52) 8.48 (8.45) 8.16 (8.13) 8.24 (8.21) 3.41 (3.41) 6.0 (5.99)
τ
0.31 (0.31) 0.36 (0.35) 0.21 (0.21) 0.72 (0.73) 0.72 (0.72) 0.43 (0.42)
D
σ
0.70 (0.70) 0.88 (0.88) 0.38 (0.38) 0.40 (0.39) 0.40 (0.40) 0.6 (0.59)
ρ
5.89 (5.88) 4.93 (4.99) 5.37 (5.41) 5.24 (5.19) 5.87 (5.72) 5.5 (5.49)
Y
1.00 (0.99) 1.00 (1.00) 0.99 (0.99) 0.36 (0.36) 0.42 (0.41) 0.82 (0.82)
By inspecting the maps shown in Figure 4, and the cluster statistics in Table 1, we can
draw some conclusions about whether the variables in each cluster are above or below
average (compared to the entire dataset), and look for correlations with the heuristic
performance metric Y. For instance, cluster 2 is characterized by instances with
above average values of processing time mean and range, below average tardiness
factor, and above average due date range. The EDD heuristic is always best under
these conditions (Y=1). Instances in cluster 3 are almost identical, except that the due
date range tends to be below average. Since cluster 3 instances are also best solved by
the EDD heuristic, one could hypothesize that the due date range does not have much
influence in predicting heuristic performance. An inspection of the maps, however,
shows this is not the case.
The distribution of Y across the map (Figure 4b) shows a clear divide between the
clusters containing instances best solved using EDD (bottom half) and the clusters
containing instances best solved using SPT (top half). Inspecting the distribution of
features across this divide leads to a simple observation that, if the tardiness factor τ is
below average (around 0.5 represented by white to mid-grey in Figure 4f), then EDD
will be best. But there are small islands of high Y values in clusters 4 and 5 that
overlay nicely with the medium grey values of due date range. So we can observe
another rule that EDD will also be best if the tardiness factor is above average and the
due date range is above average. Also of interest, from these maps we can see that
problem size and the penalty ratio do not influence the relative performance of these
heuristics. As neither heuristic considers the penalty ratio (it is used within the
optimal idle time insertion algorithm [24], common to both heuristics, but not used by
the EDD or SPT heuristics themselves), its not being a factor in the clusters is not
surprising.
a) 5 Clusters in 2-d space
b) Distribution of Y across clusters
c
)
Distribution of n
d) Distribution of
e) Distribution of
σ
Fig. 4. Self-Organizing Map showing 5 clusters (fig. 4a), the heuristic performance
variable Y (fig 4b), and the distribution of each of the six features across the clusters
(fig 4c – fig 4h). The grey scale shows each variable at its minimum value as white,
and maximum value as black.
Within Viscovery SOMine, specific regions of the map can be selected, and used
as the basis of a classification. In other words, we can define regions and islands to be
predictive of one heuristic excelling based on the training data (50,000 instances). We
can then test the generalization of the predictive model using the remaining 25,000
instances as a test set, and applying the k-nearest neighbour algorithm to determine
instances that belong to the selected region. We select the dark-grey to black regions
f) Distribution of
ρ
h) Distribution of
g) Distribution of
σ
of the Y map in Figure 4b, and declare that any test instances falling in the selected
area are classified as Y=1, while any instances falling elsewhere in the map are
classified as Y=0. The resulting accuracy on the test set is 95% in predicting which
heuristic will perform better. The self-organizing map has proven to be useful for both
visualization of feature space and predictive modeling of heuristic performance,
although the accuracy is not quite as high as the supervised learning approaches.
5 Conclusions and Future Research
In this paper we have illustrated how the concepts of Rice’s Algorithm Selection
Problem can be extended within a knowledge discovery framework, and applied to
the domain of optimization in order that we might gain to insights into optimization
algorithm performance. This paper represents one of the first attempts to apply this
approach to understand more about optimisation algorithm performance. A large
meta-data set comprising 75,000 instances of the E/T scheduling problem has been
used to explore what can be learned about the relationships between the features of
the problem instances and the performance of heuristics. Both supervised and
unsupervised learning approaches have been presented, each with their own
advantages and disadvantages made clear by the empirical results. The neural network
obtained the highest accuracy for performance prediction, but its weakness is the lack
of explanation or interpretability of the model. Our goal is not merely performance
prediction, but to gain insights into the characteristics of instances that make solution
by one heuristic superior than another. Decision trees are also a supervised learning
method, and the rules produced demonstrate the potential to obtain both accurate
performance predictions and some insights. Finally, the self-organizing map
demonstrated its benefits for visualization of the meta-data and relationships therein.
One of the most important considerations for this approach to be successful for any
arbitrary optimization problem is the choice of features used to characterize the
instances. These features need to be carefully chosen in such a way that they can
characterize instance and problem structure as well as differentiate algorithm
performance. There is little that will be learned via a knowledge discovery process if
the features selected to characterize the instances do not have any differentiation
power. The result will be supervised learning models of algorithm performance that
predict average behaviour with accuracy measures no better than the default
accuracies one could obtain from using a naïve model. Likewise, the resulting self-
organizing map would show no discernible difference between the clusters when
superimposing Y values (unlike in Figure 3b where we obtain a clear difference
between the top and bottom halves of the map). Thus the success of any knowledge
discovery process depends on the quality of the data, and in this case, the meta-data
must use features that serve the purpose of differentiating algorithm performance. In
this paper we have used a small set of problem-dependent features, related to the E/T
Scheduling Problem, which would be of no use to any other optimization problem.
For other optimization problem like graph colouring or the Travelling Salesman
Problem, recent developments in phase transition analysis (e.g. [35]) could form the
foundation of the development of useful features. Landscape analysis [20, 27]
provides a more general (problem independent) set of features, as do ideas from
landmarking [28] and hyper-heuristics [30]. It is natural to expect that the best results
will be obtained from a combination of generic and problem dependent features, and
this will be the focus of our future research. In addition, we plan to extend the
approach to consider the performance of a wider variety of algorithms, especially
meta-heuristics, where we will also be gathering meta-data related to the features of
the meta-heuristics themselves (e.g. hill-climbing capability, tabu list, annealing
mechanism, population-based search, etc.). This will help to close the loop to ensure
that any insights derived from such an approach are able to provide inputs into the
design of new hybrid algorithms that adapt the components of the meta-heuristic
according to the instance features – an extension of the highly successful algorithm
portfolio approach [4].
References
1. Rice, J. R.: The Algorithm Selection Problem. Adv. Comp. 15, 65--118 (1976)
2. Watson, J.P., Barbulescu, L., Howe, A.E., Whitley, L.D.: Algorithm Performance and
Problem Structure for Flow-shop Scheduling. In: Proc. AAAI Conf. on Artificial
Intelligence, pp. 688--694 (1999)
3. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Optimization. IEEE T.
Evolut. Comput. 1, 67 (1997)
4. Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: Satzilla-07: The Design And Analysis Of
An Algorithm Portfolio For SAT. LNCS, vol. 4741, pp. 712--727 (2007)
5. Leyton-Brown, K., Nudelman, E., Shoham, Y.: Learning the Empirical Hardness of
Optimization Problems: The Case of Combinatorial Auctions. LNCS, vol. 2470. pp. 556--
569. Springer, Heidelberg (2002)
6. Leyton-Brown, K., Nudelman, E., Andrew, G., McFadden, J., Shoham, Y.: A Portfolio
Approach to Algorithm Selection. In: Proc. IJCAI. pp. 1542--1543 (2003)
7. Nudelman, E., Leyton-Brown, K., Hoos, H., Devkar, A., Shoham, Y.: Understanding
Random SAT: Beyond the Clauses-To-Variables Ratio. LNCS, vol. 3258, pp. 438--452
(2004)
8. Horvitz, E., Ruan, Y., Gomes, C., Kautz, H., Selman, B., Chickering, M.: A Bayesian
Approach to Tackling Hard Computational Problems. In: Proc. 17th Conf. on Uncertainty
in Artificial Intelligence, pp. 235--244. Morgan Kaufmann, San Francisco (2001)
9. Samulowitz, H., Memisevic, R.: Learning to solve QBF. In: Proc. 22nd AAAI Conf. on
Artificial Intelligence, pp. 255--260 (2007)
10. Streeter, M., Golovin, D., Smith, S. F.: Combining multiple heuristics online. In: Proc.
22nd AAAI Conf. on Artificial Intelligence, pp. 1197--1203 (2007)
11. Vilalta, R., Drissi, Y.: A Perspective View and Survey of Meta-Learning. Artif. Intell.
Rev. 18, 77--95 (2002)
12. Michie, D., Spiegelhalter, D.J., Taylor C.C. (eds.) Machine Learning, Neural and
Statistical Classification. Ellis Horwood, New York (1994)
13. Brazdil, P., Soares, C., Costa, J.: Ranking Learning Algorithms: Using IBL and Meta-
Learning on Accuracy and Time Results. Mach. Learn. 50, 251--277 (2003)
14. Ali, S., Smith, K.: On Learning Algorithm Selection for Classification. Appl. Soft Comp.
6, 119--138 (2006)
15. Stützle, T., Fernandes, S.: New Benchmark Instances for the QAP and the Experimental
Analysis of Algorithms. LNCS, vol. 3004, pp. 199--209 (2004)
16. Carchrae, T., Beck, J.C.: Applying Machine Learning to Low Knowledge Control of
Optimization Algorithms. Comput. Intell. 21, 373--387 (2005)
17. Shaw, M.J., Park, S., Raman, N.: Intelligent Scheduling With Machine Learning
Capabilities: The Induction Of Scheduling Knowledge. IIE Trans. 24, 156--168 (1992)
18. Knowles, J. D., Corne, D. W.: Towards Landscape Analysis to Inform the Design of a
Hybrid Local Search for the Multiobjective Quadratic Assignment Problem. In: Abraham,
A., Ruiz-Del-Solar, J., Koppen M. (eds.) Soft Computing Systems: Design, Management
and Applications, pp. 271--279. IOS Press, Amsterdam (2002)
19. Merz, P.: Advanced Fitness Landscape Analysis and the Performance of Memetic
Algorithms. Evol. Comp., 2, 303--325 (2004)
20. Watson, J., Beck, J. C., Howe, A. E., Whitley, L. D.: Problem Difficulty for Tabu Search
in Job-Shop Scheduling. Artif. Intell. 143, 189--217 (2003)
21. Smith-Miles, K. A.: Cross-Disciplinary Perspectives On Meta-Learning For Algorithm
Selection. ACM Computing Surveys. In press (2009).
22. Baker, K.R., Scudder, G.D.: Sequencing With Earliness and Tardiness Penalties: A
Review. Ops. Res., 38, 22--36 (1990)
23. James, R. J. W., Buchanan, J. T.: A Neighbourhood Scheme with a Compressed Solution
Space for The Early/Tardy Scheduling Problem. Eur. J. Oper.Res. 102, 513--527 (1997)
24. Fry T.D., Armstrong R.D., Blackstone J.H.: Minimizing Weighted Absolute Deviation in
Single Machine Scheduling. IIE Transactions, 19, 445--450 (1987)
25. Vollmann T.E., Berry, W.L., Whybark, D.C., Jacobs, F.R.: Manufacturing Planning and
Control for Supply Chain Management. 5
th
Edition, McGraw Hill, New York (2005)
26. Krajewski, L.J., Ritzman, L.P.: Operations Management: Processes and Value Chains. 7
th
Edition, Pearson Prentice Hall, New Jersey (2005)
27. Schiavinotto, T., Stützle, T.: A review of metrics on permutations for search landscape
analysis. Comput. Oper. Res. 34, 3143--3153 (2007).
28. Pfahringer, B., Bensusan, H., Giraud-Carrier, C. G.: Meta-Learning by Landmarking
Various Learning Algorithms. In: Proc. ICML. pp. 74--750 (2000)
29. Baker K. B., Martin, J. B.: An Experimental Comparison of Solution Algorithms for the
Single Machine Tardiness Problem. Nav. Res. Log. 21, 187--199 (1974)
30. Burke, E., Hart, E., Kendall, G., Newall, J., Ross, P., Schulenburg, S.: Hyper-heuristics:
An Emerging Direction in Modern Search Technology. In: Glover, F., Kochenberger, G.
(eds.) Handbook of Meta-heuristics. pp. 457--474. Kluwer, Norwell MA (2002)
31. Smith, K. A.: Neural Networks for Prediction and Classification. In: Wang, J.(ed.),
Encyclopaedia of Data Warehousing And Mining. vol. 2, pp. 86--869, Information
Science Publishing, Hershey PA (2006)
32. Witten, I. H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques.
2nd Edition. Morgan Kaufmann, San Francisco (2005)
33. Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco
(1993)
34. Kohonen, T.: Self-Organized Formation of Topologically Correct Feature Maps. Biol.
Cyber. 43, 59--69 (1982)
35. Achlioptas, D., Naor, A., Peres, Y.: Rigorous Location of Phase Transitions in Hard
Optimization Problems. Nature, 435, 759--764 (2005)
... Such observations might be useful in designing decision support systems that incorporate the choice of the most promising algorithm. In [108], Smith-Miles et al. study the relationship between problem characteristics, such as the number of jobs, the mean processing time, the tardiness factor (here: due date factor) and the due date range, and the performance of two priority rules for a singlemachine early/tardy scheduling problem. It is stated that the tardiness factor and the due date range correlate with the heuristic performance. ...
... Since the due dates are generated by applying a constant due date factor of 1.2 for all instances, the declaration of this multiplier as an instance property, like it is done in [92] and [108], is meaningless. However, due dates are expected to aect the diculty of an instance and the observation of a due date-related characteristic seems reasonable. ...
... Their expertise usually comes from experience, and little theoretical work is done in the literature to support the understanding of scheduling problem structure in a meaningful and easy-to-relate way. Smith-Miles, James, Giffin, andTu (2009) andSmith-Miles, van Hemert, andLim (2010) investigated the use of data mining to understand the relationship between scheduling and the travelling salesman problem 1 structure and heuristic performance, while Ingimundardottir and Runarsson (2012) used machine learning to understand why and predict when a particular JSSP instance is easy or difficult for certain heuristics. We believe such work is valuable, because without a practical understanding of problem structure, moving towards a goal of better scheduling practice might just be a tedious campaign of trial-and-error. ...
... Their expertise usually comes from experience, and little theoretical work is done in the literature to support the understanding of scheduling problem structure in a meaningful and easy-to-relate way. Smith-Miles, James, Giffin, andTu (2009) andSmith-Miles, van Hemert, andLim (2010) investigated the use of data mining to understand the relationship between scheduling and the travelling salesman problem 1 structure and heuristic performance, while Ingimundardottir and Runarsson (2012) used machine learning to understand why and predict when a particular JSSP instance is easy or difficult for certain heuristics. We believe such work is valuable, because without a practical understanding of problem structure, moving towards a goal of better scheduling practice might just be a tedious campaign of trial-and-error. ...
Article
In this paper, we conduct a statistical study of the relationship between Job-Shop Scheduling Problem (JSSP) features and optimal makespan. To this end, a set of 380 mostly novel features, each representing a certain problem characteristic, are manually developed for the JSSP. We then establish the correlation of these features with optimal makespan through statistical analysis measures commonly used in machine learning, such as the Pearson Correlation Coefficient, and as a way to verify that the features capture most of the existing correlation, we further use them to develop machine learning models that attempt to predict the optimal makespan without actually solving a given instance. The prediction is done as classification of instances into coarse lower or higher-than-average classes. The results, which constitute cross-validation and test accuracy measures of around 80% on a set of 15000 randomly generated problem instances, are reported and discussed. We argue that given the obtained correlation information, a human expert can earn insight into the JSSP structure, and consequently design better instances, design better heuristic or hyper-heuristics, design better benchmark instances, and in general make better decisions and perform better-informed trade-offs in various stages of the scheduling process. To support this idea, we also demonstrate how useful the obtained insight can be through a real-world application.
... The selection mapping s(f (x)) which associates for each set of features from F an algorithm α from A, is a learning problem (see Smith-Miles et al., 2009). Therefore, a classification model is built using machine learning algorithms on the meta-data. ...
Article
Full-text available
Today, the algorithm selection paradigm has become one of the promising approaches in the field of optimization problems. Its main goal is to solve each case of an optimization problem with the most accurate algorithm using machine learning techniques. This paper treats the issue of the algorithm selection for the Single Machine Scheduling Problem with Early/Tardy jobs by adapting three metaheuristics from the state-of-the-art, namely genetic algorithm, particle swarm optimization, and tabu search. In the proposed framework, we combine the running time and the cost function to get a new performance criterion. A large set composed of 98000 instances of the problem is generated with 12 features characterizing each instance. We carry a statistical comparison of the implemented meta-heuristics, and we evaluate 10 classifiers. It can be deduced that the Dagging algorithm combined with the Random Forest is the most likely to be the best classifier, which achieves 88.44% of the maximum accuracy.
... The study of solution space could guide the design of algorithms to avoid this situation. After Manderick, Weger, and Spiessens first used the "Fitness Landscape Theory" to analyze GA, more and more researchers pay their attentions on the study of solution space of different problems, and applied the concept of fitness landscape to describe the characteristics of the problems and analyze the performance of algorithms [44][45][46][47][48][49]. Wen, Gao, and Li introduced the logistic model into JSP, which is a core theory in population ecology [50]. ...
Chapter
Job shop Scheduling Problem (JSP) which is widespread in the real-world production system is one of the most general and important problems in various scheduling problems. Nowadays, the effective method for JSP is a hot topic in the research area of the manufacturing system. JSP is a typical NP-hard combinatorial optimization problem and has a broad engineering application background. Due to the large and complicated solution space and process constraints, JSP is very difficult to find an optimal solution within a reasonable time even for small instances. In this chapter, a hybrid Particle Swarm Optimization algorithm (PSO) based on Variable Neighborhood Search (VNS) has been proposed to solve this problem. In order to overcome the blind selection of neighborhood structures during the hybrid algorithm design, a new neighborhood structure evaluation method based on logistic model has been developed to guide the neighborhood structures selection. This method is utilized to evaluate the performance of different neighborhood structures. Then the neighborhood structures which have good performance are selected as the main neighborhood structures in VNS. Finally, a set of benchmark instances has been conducted to evaluate the performance of the proposed hybrid algorithm and the comparisons among some other state-of-the-art reported algorithms are also presented. The experimental results show that the proposed hybrid algorithm has achieved good improvement in the optimization of JSP, which also verifies the effectiveness and efficiency of the proposed neighborhood structure evaluation method.
... Despite the importance of matching instance properties of academic timetabling problems with the performance of available solving methods (Rossi-Doria et al. 2002), few research works in the literature have proposed features to describe both the search space and the relative hardness of the instances (Kostuch and Socha 2004;Smith-Miles et al. 2009;Rodriguez-Maya et al. 2016). Therefore, to fill this gap, three sets of relevant CB-CTT metrics, defined to predict the empirical hardness of CB-CTT instances, were compared in the previous section. ...
Article
Full-text available
University timetabling is a real-world problem frequently encountered in higher education institutions. It has been studied by many researchers who have proposed a wide variety of solutions. Measuring the variation of the performance of solution approaches across instance spaces is a critical factor for algorithm selection and algorithm configuration, but because of the diverse conditions that define the problem within different educational contexts, measurement has not been formally addressed within the university timetabling context. In this paper, we propose a set of metrics to predict the performance of combinatorial optimization algorithms that generate initial solutions for university timetabling instances. These metrics, derived from the fields of enumerative combinatorics and graph coloring, include size-related instance properties, counting functions, feature ratios and constraint indexes evaluated through a feature selection methodology that, based on regression algorithms, characterizes the empirical hardness of a subspace of synthetically generated instances. The results obtained with this methodology show the current need not only to develop solution strategies for particular cases of the problem, but also to produce a formal description of the conditions that make instance spaces hard to solve, in order to improve and integrate the available solution approaches.
... If a meaningful set of features is chosen to characterize the instance space, this approach should permit the identification of comparative advantages in competing algorithms. Examples of these studies that rely on machine-learning and statistics include Leyton-Brown et al. [23] for winner determination problem, Smith-Miles et al. [34] for Traveling salesman problem, Cho et al. [6] and Hall and Posner [16] for knapsack problems, and Smith-Miles et al. [33] for job shop scheduling problem. ...
Preprint
We present a benchmark set for Traveling salesman problem (TSP) with characteristics that are different from the existing benchmark sets. In particular, we focus on small instances which prove to be challenging for one or more state-of-the-art TSP algorithms. These instances are based on difficult instances of Hamiltonian cycle problem (HCP). This includes instances from literature, specially modified randomly generated instances, and instances arising from the conversion of other difficult problems to HCP. We demonstrate that such benchmark instances are helpful in understanding the weaknesses and strengths of algorithms. In particular, we conduct a benchmarking exercise for this new benchmark set totalling over five years of CPU time, comparing the TSP algorithms Concorde, Chained Lin-Kernighan, and LKH. We also include the HCP heuristic SLH in the benchmarking exercise. A discussion about the benefits of specifically considering outlying instances, and in particular instances which are unusually difficult relative to size, is also included.
... This work considers both randomly generated instances and structured ones taken from public sources. The random instances used for the first experiment were generated by using random generation model B [38], while the rest of the randomly generated instances were produced by using model RB [39]. Model B produces a graph with exactly ( 1 ( −1))/2 constraints. ...
Article
Full-text available
When solving constraint satisfaction problems (CSPs), it is a common practice to rely on heuristics to decide which variable should be instantiated at each stage of the search. But, this ordering influences the search cost. Even so, and to the best of our knowledge, no earlier work has dealt with how first variable orderings affect the overall cost. In this paper, we explore the cost of finding high-quality orderings of variables within constraint satisfaction problems. We also study differences among the orderings produced by some commonly used heuristics and the way bad first decisions affect the search cost. One of the most important findings of this work confirms the paramount importance of first decisions. Another one is the evidence that many of the existing variable ordering heuristics fail to appropriately select the first variable to instantiate. Another one is the evidence that many of the existing variable ordering heuristics fail to appropriately select the first variable to instantiate. We propose a simple method to improve early decisions of heuristics. By using it, performance of heuristics increases.
... This research aims to ll this methodology gap by providing a new framework to evaluate the performance of ATCGTs by providing information about their e ectiveness according to the features of the software system and therefore enabling the selection of the best technique. The META (Mapping the E ectiveness of Test Automation) Framework was inspired on an innovative framework that has been successfully applied on OR problems [17][18][19]. The proposed framework, di erently from the commonly used methodology, aims to characterize both strengths and weaknesses of the analyzed techniques using existing and newly developed features (complexity measures and metrics) extracted from Software Artifacts, more speci cally, from the CUTs. ...
Conference Paper
Automated Test Case Generation (ATCG) is an important topic in Software Testing, with a wide range of techniques and tools being used in academia and industry. While their usefulness is widely recognized, due to the labor-intensive nature of the task, the effectiveness of the different techniques in automatically generating test cases for different software systems is not thoroughly understood. Despite many studies introducing various ATCG techniques, much remains to be learned, however, about what makes a particular technique work well (or not) for a specific software system. Therefore, we propose a new methodology to evaluate and select the most effective ATCG technique using structure-based complexity measures. Empirical tests are going to be performed using two different techniques: Search-based Software Testing (SBST) and Random Testing (RT).
... Smith-Miles [15] proposed a framework for analyzing the performance of various algorithms for QAP instances to get insights into the relationship between instance space features and the performance of the algorithms evaluated. In a subsequent study, Smith-Miles et al. analyzed the performance of heuristics for the scheduling problem by using a decision tree [16]. To conduct the analysis, 75000 scheduling instances were generated and solved by using two common scheduling heuristics. ...
Article
Full-text available
Constraint satisfaction problems are of special interest for the artificial intelligence and operations research community due to their many applications. Although heuristics involved in solving these problems have largely been studied in the past, little is known about the relation between instances and the respective performance of the heuristics used to solve them. This paper focuses on both the exploration of the instance space to identify relations between instances and good performing heuristics and how to use such relations to improve the search. Firstly, the document describes a methodology to explore the instance space of constraint satisfaction problems and evaluate the corresponding performance of six variable ordering heuristics for such instances in order to find regions on the instance space where some heuristics outperform the others. Analyzing such regions favors the understanding of how these heuristics work and contribute to their improvement. Secondly, we use the information gathered from the first stage to predict the most suitable heuristic to use according to the features of the instance currently being solved. This approach proved to be competitive when compared against the heuristics applied in isolation on both randomly generated and structured instances of constraint satisfaction problems.
Thesis
Full-text available
The majority of the most effective and efficient algorithms for multi-objective optimization are based on Evolutionary Computation. However, choosing the most appropriate algorithm to solve a certain problem is not trivial and often requires a time-consuming trial process. As an emerging area of research, hyper-heuristics investigates various techniques to detect the best low-level heuristic while the optimization problem is being solved. On the other hand, agents are autonomous component responsible for watching an environment and perform some actions according to their perceptions. In this context, agent-based techniques seem suitable for the design of hyper-heuristics. There are several hyper-heuristics proposed for controlling low-level heuristics, but only a few of them are focused on selecting multi-objective optimization algorithms (MOEA). This work presents an agent-based hyper-heuristic for choosing the best multi-objective evolutionary algorithm. Based on Social Choice Theory, the proposed framework performs a cooperative voting procedure, considering a set of quality indicator voters, to define which algorithm should generate more offspring along to the execution. Comparative performance analysis was performed across several benchmark functions and real-world problems. Results showed the proposed approach was very competitive both against the best MOEA for each given problem and against state-of-art hyper-heuristics.
Article
Full-text available
It has been widely observed that there is no "dominant" SAT solver; instead, different solvers perform best on different instances. Rather than following the traditional approach of choosing the best solver for a given class of instances, we advocate making this decision online on a per-instance basis. Building on previous work, we describe a per-instance solver portfolio for SAT, SATzilla-07, which uses socalled empirical hardness models to choose among its constituent solvers. We leverage new model-building techniques such as censored sampling and hierarchical hardness models, and demonstrate the effectiveness of our techniques by building a portfolio of state-of-the-art SAT solvers and evaluating it on several widely-studied SAT data sets. Overall, we show that our portfolio significantly outperforms its constituent algorithms on every data set. Our approach has also proven itself to be effective in practice: in the 2007 SAT competition, SATzilla-07 won three gold medals, one silver, and one bronze; it is available online at http://www.cs.ubc.ca/labs/beta/Projects/SATzilla.
Chapter
Full-text available
This chapter introduces and overviews an emerging methodology in search and optimisation. One of the key aims of these new approaches, which have been termed hyperheuristics, is to raise the level of generality at which optimisation systems can operate. An objective is that hyper-heuristics will lead to more general systems that are able to handle a wide range of problem domains rather than current meta-heuristic technology which tends to be customised to a particular problem or a narrow class of problems. Hyper-heuristics are broadly concerned with intelligently choosing the right heuristic or algorithm in a given situation. Of course, a hyper-heuristic can be (often is) a (meta-)heuristic and it can operate on (meta-)heuristics. In a certain sense, a hyper-heuristic works at a higher level when compared with the typical application of meta-heuristics to optimisation problems, i.e., a hyper-heuristic could be thought of as a (meta)-heuristic which operates on lower level (meta-)heuristics. In this chapter we will introduce the idea and give a brief history of this emerging area. In addition, we will review some of the latest work to be published in the field.
Chapter
Neural networks are simple computational tools for examining data and developing models that help to identify interesting patterns or structures. The data used to develop these models is known as training data. Once a neural network has been exposed to the training data, and has learnt the patterns that exist in that data, it can be applied to new data thereby achieving a variety of outcomes. Neural networks can be used to: • learn to predict future events based on the patterns that have been observed in the historical training data; • learn to classify unseen data into pre-defined groups based on characteristics observed in the training data; • learn to cluster the training data into natural groups based on the similarity of characteristics in the training data. Purchase this chapter to continue reading all 5 pages > In this article, we illustrate a general approach for the semi-automatic construction and management of data warehouses. Our approach is...
Book
This textbook provides a comprehensive framework for addressing operational and supply-chain issues, building the concept of a supply chain from the ground up. Starting with the analysis of business processes and how they relate to the overall operational goals of a firm, this text proceeds to show how these processes are integrated to form supply chains and how they can be managed to obtain efficient flows of materials, information and funds. This approach reinforces the idea that supply chains are only as good as the processes within and across each firm in the supply chain.
Article
The single machine, distinct due date, early/tardy machine scheduling problem closely models the situation faced by Just-In-Time manufacturers. This paper develops a new method of finding good quality solutions to this scheduling problem by using the concept of a ‘compressed solution space’, based on a binary representation of the early/tardy scheduling problem, and tabu search. A heuristic which simultaneously sequences and schedules the jobs is developed to perform the conversion between the compressed and physical solution spaces. Results show that the compressed solution space performs well with small problems, and is superior to standard tabu search solution spaces for large-scale, realistically sized problems.
Article
This paper presents a procedure to minimize the total penalty when jobs are scheduled on a single machine subject to earliness and tardiness penalties. This performance criterion has been shown to be non-regular thus, requiring a search among schedules with inserted machine idle time to find a solution. A procedure to optimally insert idle time is also presented.
Article
We consider the problem of scheduling n jobs to minimize the total earliness and tardiness penalty. We review the literature on this topic, providing a framework to show how results have been generalized starting with a basic model that contains symmetric penalties, one machine and a common due date. To this base we add such features as parallel machines, complex penalty functions and distinct due dates. We also consolidate many of the existing results by proving general forms of two key properties of earliness/tardiness models.
Article
Dynamic scheduling of manufacturing systems has primarily involved the use of dispatching rules. In the context of conventional job shops, the relative performance of these rules has been found to depend upon the system attributes, and no single rule is dominant across all possible scenarios. This indicates die need for developing a scheduling approach which adopts a state-dependent dispatching rule selection policy. The importance of adapting the dispatching rule employed to the current state of the system is even more critical in a flexible manufacturing system because of alternative machine routing possibilities and me need for increased coordination among various machines. This study develops a framework for incorporating machine learning capabilities in intelligent scheduling. A pattern-directed method, with a built-in inductive learning module, is developed for heuristic acquisition and refinement. This method enables the scheduler to classify distinct manufacturing patterns and to generate a decision tree consisting of heuristic policies for dynamically selecting the dispatching rule appropriate for a given set of system attributes. Computational experience indicates that the learning-augmented approach leads to improved system performance. In addition, the process of generating die decision tree shows the efficacy of inductive learning in extracting and ranking the various system attributes relevant for deciding upon the appropriate dispatching rule to employ.
Article
A basic problem in scheduling involves the sequencing of a set of independent tasks at a single facility with the objective of minimizing mean tardiness. Although the problem is relatively simple, the determination of an optimal sequence remains a challenging combinatorial problem. A number of algorithms have been developed for finding solutions, and this paper reports a comparative evaluation of these procedures. Computer programs for five separate algorithms were written and all were run on a data base designed to highlight computational differences. Optimizing algorithms developed by Emmons and by Srinivasan appeared to be particularly efficient in the comparative study.