Conference PaperPDF Available

A Simulation-Based Learnheuristic Algorithm for the Stochastic Team Orienteering Problem with Dynamic Rewards

Authors:
Proceedings of the Operational Research Society Simulation Workshop 2020 (SW20)
M. Fakhimi, D. Robertson, and T. Boness, eds.
A SIMULATION-BASED LEARNHEURISTIC ALGORITHM FOR THE STOCHASTIC TEAM
ORIENTEERING PROBLEM WITH DYNAMIC REWARDS
Christopher Bayliss
Pedro J. Copado-Mendez
Javier Panadero
Angel A. Juan
IN3 Computer Science Department
Universitat Oberta de Catalunya
Av. Carl Friedrich Gauss 5
Castelldefels, 08860, SPAIN
{cbayliss,pcopadom,jpanaderom,ajuanp}@uoc.edu
Abstract
In this paper we consider the stochastic team orienteering problem (STOP) with dynamic rewards and
stochastic travel times. In the STOP, the goal is to generate routes for a fixed set of vehicles such that
the sum of the rewards collected is maximised whilst ensuring that nodes are visited before a fixed time
limit expires. The rewards associated with each node are dependent upon the times at which they are
visited. Also, the dynamic reward values have to be learned from simulation experiments during the search
process. To solve this problem we propose a biased-randomised heuristic (BRH), which integrates a learning
module and a simulation model within a learnheuristic algorithm (BRLH). Randomisation is important for
generating a wide variety of solutions which capture the trade off between reward and reliability. A series
of computational experiments are carried out in order to analyse the performance of our BRLH approach.
1 Introduction
The team orienteering problem (TOP) was first studied by Chao et al. (1996). It consists of generating
time-constrained routes through a graph, for a fixed-size fleet of mvehicles, such that the rewards collected
from node visits is maximised. Each of the mvehicle’s tour begins and ends at a depot node. The stochastic
TOP (STOP) is an extension of the TOP where travel times between nodes and/or node rewards are uncertain.
In this work, only the rewards collected within a limited amount of time count. The STOP introduces the
need to consider not only reward maximisation, but also solution reliability. In this work, a solution is
deemed infeasible if any of the tours are not completed within the specified time limit. The reliability is
defined as the probability that a solution can be completed without failures. This work considers a STOP
with dynamic rewards (STOPDR), in which the rewards associated with nodes have a static component
and also a dynamic component. The dynamic component accounts for: (i) bonuses for visiting nodes early
in a route; and (ii) penalties for nodes visited late in routes. The process through which bonus values
and penalty values are generated is a hidden and unknown process. Hence, it has to be learned from
simulation observations. We consider the case in which bonuses are achieved for nodes visited as the first
node in a route, whilst penalties are applied for nodes visited last in routes. The bonus and penalty values
for nodes, when applicable, are assumed to follow an unknown statistical distribution. When solving the
STOPDR, the challenge is to learn the bonuses and penalties associated with each node from the simulation
testing of candidate solutions whilst, at the same time, trying to maximise the total reward and also ensure
that the solutions have a good level of reliability under stochastic travels times. Figure 1 provides an
Bayliss, Copado, Panadero, and Juan
illustrative diagram depicting some of the main features of a STOPDR, including: stochastic edge traversal
times; bonuses for first nodes in routes; penalties for last nodes in routes; and no rewards at all for
nodes visited after the time limit. In order to meet these requirements we propose a biased randomisation
Figure 1: An illustrative example of the STOP with dynamic rewards.
learnheuristic algorithm (BRLH) (Grasas et al. 2017). In general, learnheuristics integrate metaheuristics
with machine learning algorithms for the purpose of solving combinatorial optimisation problems with
inherent learning problems, such as those within dynamic inputs generated through an unknown process
(Calvet et al. 2017, Arnau et al. 2018). We propose a biased randomisation algorithm for playing the role
of the metaheuristic algorithm, and use a simple averaging learning mechanism for playing the role of
the machine learning algorithm. A biased-randomisation algorithm is an ideal choice for the metaheuristic
component of a learnheuristic. The reasons for this are as follows: (i) the parameterised randomisation
of a biased-randomised constructive solution procedure provides an ideal mechanism for addressing the
learning and optimisation requirements of a learnheuristic, since the level of randomness can be increased
to increase the diversity of the input data used for learning to predict the values of the dynamic rewards;
and (ii) it is a simple matter to incorporate and use the predictions of the learning mechanism within a
constructive solution approach. The level of randomness introduced into the greedy constructive solution
procedure has an additional role, that of generating a wide range of solutions with different average reward
and reliability characteristics. We identify the set of non-dominated solutions as candidate solutions to be
considered by the decision maker.
The rest of the paper is structured as follows: Section 2 reviews related work; Section 3 provides
a formal description of the STOPDR; Section 4 presents the biased-randomised heuristic, which is then
extended into a learnheuristic in Section 5; Section 6 provides experimental results based on a set of
benchmark instances; finally, Section 7 reports the main conclusions and future research.
2 LITERATURE REVIEW
The single-vehicle orienteering problem (OP) was introduced by Golden et al. (1987). Here, the authors
considered the deterministic version of the problem, in which one vehicle chooses the set of nodes to visit
as well as the visiting order during a specified time interval. This is a very general problem with many
applications, including the tourist trip design problem (Souffriau et al. 2008). Gunawan et al. (2016)
provides an excellent review of the OP and its variants.
Bayliss, Copado, Panadero, and Juan
In this paper, we focus on the TOP, which is a multi-vehicle OP. The TOP was first introduced in
Chao et al. (1996), who extended their methodology from the single-vehicle problem to consider multiple
vehicles aiming to maximise their combined reward from visiting a selection of points within a given
time interval. Other varieties include the time-dependent OP (Verbeeck et al. 2014), the TOP with time
windows (Lin and Yu 2012), and the multi-modal TOP with time windows (Yu et al. 2017), just to name a
few. Recent methods used to solve the deterministic version of the problem have included particle swarm
optimisation (Dang et al. 2013), simulated annealing (Lin 2013), genetic algorithms (Ferreira et al. 2014), a
Pareto mimic algorithm (Ke et al. 2015), or branch-and-cut-and-price algorithms (Keshtkaran et al. 2015).
Allowing for uncertainty in rewards, travel and / or service times widens the applicability of the
problem. Hence, Ilhan et al. (2008) were the first to incorporate stochasticity into the OP. Studies on the
stochastic TOP are much more recent. Simheuristic algorithms (Juan et al. 2018) combining simulation
with metaheuristics to solve stochastic versions of the TOP have been proposed by Panadero et al. (2017)
and Reyes-Rubiano et al. (2018). Panadero et al. (2017) tackles a very similar problem to that considered
in this work with the exception that stochastic rewards are not learned from observations as the algorithm
progresses. However, to the best of our knowledge, this is the first paper proposing the use of learnheuristic
algorithms (Calvet et al. 2016) to deal with the dynamic version of the STOP.
3 A FORMAL DESCRIPTION OF THE STOP WITH POSITION-DEPENDENT REWARDS
In a team orienteering problem, the fleet is composed of m2 vehicles, and there is a time threshold,
tmax >0, for completing each route. The set of nodes that can be visited is denoted N={1,2,...,n}.
Vehicles travel along the edges, denoted A, connecting all of the nodes with each other. The STOP is
therefore set within an undirected graph G= (N,A). All vehicle tours begin at node 1, the origin depot,
and end at node n, the end depot. Each non-depot node can be visited at most once and by only one
vehicle. In the STOPDR, a reward ui0 is received for visiting node i. In our dynamic version, if node i
is the first non-depot node in a vehicle’s tour, then a bonus Biis added to the reward uifor visiting that
node. It is assumed that Bifollows an unknown statistical distribution. If node iis the last non-depot
node in a vehicle’s tour, then a penalty P
i<=uiis subtracted from the reward ui, with P
ialso following
an unknown statistical distribution. The values of Biand P
iiNhave to be learned from observations
of the real process –or from a detailed simulation of that process. Traversing an edge e= (i,j)Ahas a
stochastic travel time, Ti j >0. In this work edge traversal times follow log-normal probability distributions.
That is, Ti j l ognormal µi j,σ2
i je= (i,j)A, where µi j and σ2
i j are the mean and variance of the log
transformed edge traversal times. Tij can be expressed as eµi j+σi j Zwhere Zis a standard normal variable.
As illustrated in Figure 1, the final solution to the problem is a set Mof mroutes, where each route is
defined by an array of nodes starting from node 0 (origin depot) and arriving at node n(end depot). The
objective function is the expected value of the sum of collected rewards, which needs to be maximised
without exceeding the threshold value for each route. In a more formal way, the objective function can be
written as:
maxE"
mM
(i,j)A
Imi (ui+F(pmi,lm)Bi+L(pmi,lm)P
i)·xm
i j#(1)
where: A={(i,j)/i,jN,i6=j}is the set of edges, and xm
i j is a binary decision variable which equals 1
if the edge (i,j)Eis in the route m, and 0 otherwise. Imi takes the value 1 if node iis visited before
the time tmax, since rewards can only be collected from nodes visited before time runs out. It takes the
value 0 otherwise. The function F(pmi,lm)is an indicator function that returns 1 if node iis the first
non-depot node in a vehicle’s tour. Similarly, the function L(pmi,lm)indicates whether or not node iis
the last non-depot node in a vehicle’s tour. The indicator function input argument pmi gives the position
number of node iin vehicle m’s tour, whilst lmgives the length of vehicle ms tour.
Bayliss, Copado, Panadero, and Juan
Thus, the objective function calculates the total expected score –including bonuses and penalties– taking
stochastic travel times into account. The constraints are given below, with some explanations provided.
First, there is a threshold time to complete each route, i.e.:
(i,j)E
xm
i j ·E[Ti j]tmax mM(2)
Additionally, each node is visited at most once during the route, each route starts at the origin depot (node
1) and arrives at the end depot (node n), and finally, each vehicle leaves each node it visits, except for the
end depot.
4 A BIASED-RANDOMISED HEURISTIC (BR-H) FOR THE STATIC TOP
In this section a biased randomisation heuristic (BRH) is presented for the non-dynamic and non-stochastic
TOP. Biased-randomisation algorithms (Grasas et al. 2017) are based on introducing randomness into a
greedy constructive algorithm. Constructive algorithms are based on generating solutions by greedily and
sequentially selecting the elements that are to be included in a solution. In a biased-randomisation, candidate
decisions are sorted in a list descending in the objective value of each decision. Decreasing probabilities
of selection are assigned to the elements in this list. Using such a skewed probability distribution for
the selection of solution elements introduces randomness into a greedy constructive algorithm in a way
that preserves the logic underlying the greedy constructive algorithm whilst allowing for the generation
of a vast array of alternative solutions as well as solutions which improve upon the solution generated
by the base greedy constructive procedure alone. Such techniques have enjoyed great success improving
the performance of classical heuristics, both in scheduling applications (Gonzalez-Neira et al. 2017) in
addition to vehicle routing ones (Faulin and Juan 2008, Dominguez et al. 2016). The base constructive
heuristic used in this paper is composed of the following phases:
1. Firstly, an initial dummy solution is generated. This is composed of a route for each non-depot
node starting at the start depot and ending at the end depot.
2. A pair of routes are selected for merging. A merge operation is based on trying to add one route
onto the end of another introducing the possibility of avoiding unnecessary trips to and from the
end and start depot respectively. Possible merge operations are sorted according to a weighted sum
of the saving in the travel cost and rewards associated with nodes either end of the merging edge.
The saving associated with edge (i,j)is as follows:
savingi j =α(din +d0jdi j )+(1α)(ui+uj)(3)
Equation (3) gives a weight of αto the travel cost saving, and a weight of (1α)to the rewards
associated with the newly added route edge. This reflects the main goals when solving a TOP, i.e.,
those of generating efficient and high-value vehicle routes. A skewed probability distribution is
applied to the sorted list of merge operations, thus introducing “biased randomness” into the base
greedy constructive algorithm. In particular, a geometric distribution is used in this work. The
equation randI =M od jl og(uniRand(0,1))
log(1β)k,|C|is used to randomly select a decision index (randI)
from the list, where |C|is the candidate decision list length, uniRand is a uniform random input
between 0 and 1 and βis the parameter controlling the level of greediness and randomness. Whenever
β=1, we have full greediness. On the contrary, whenever β=0 we obtain full randomness. When
0<β<1 we get trade offs between the two.
3. Repeat Step 2. until no more merges are possible.
4. From those generated by following Steps 1-3, select the mroutes that maximise the total reward.
Bayliss, Copado, Panadero, and Juan
By selecting 0 <β<1 repeating steps 1-4 for a specified number of iterations or amount of time the
biased randomisation algorithm can generate good quality solutions which achieve different average reward
and reliability trade offs.
5 EXTENDING OUR BR-H TO A LEARNHEURISTIC (BR-LH) FOR THE DYNAMIC STOP
In this section we incorporate the BRH introduced in Section 4 within a learnheuristic framework for
solving the STOPDR. We begin by describing the modifications to the savings calculations, which are
required for taking dynamic rewards into account. Performing a merge operation based on joining a route
whose last non-depot node is node iwith a route whose first non-depot edge is node jhas the effect that
the penalty from visiting node ilast in a route is avoided, whilst the bonus from visiting node jfirst in a
route is lost. Given that we denote the current best estimate for the bonus associated with node jas ˆ
bjand
the current best estimate for the penalty associated with node ias ˆpithe savings calculation of Equation (3)
is replaced with the following for the case of dynamic rewards:
savingi j =α(din +d0jdi j )+(1α)ui+uj+ˆpiˆ
bj(4)
Equation (4) acknowledges that merging the route whose last non-depot node is iwith the route whose
first non-depot node is jresults in the avoidance of the penalty associated with node ibut also the loss
of the bonus associated with node j. Learnheuristics can be viewed as an extension of the concept
of metaheuristics. While simheuristics integrate metaheuristics with simulation for solving stochastic
combinatorial optimisation problem (De Armas et al. 2017), learnheuristics follow a similar framework
with the incorporation of a learning component into a metaheuristic to deal with problems with dynamic
inputs.
Figure 2 outlines the proposed learnheuristic framework for solving the STOPDR. The parts both
underlined and in italics are the steps that our learnheuristic solution approach has in addition to the steps
involved in a typical simheuristic approach for solving the stochastic but static version of the problem.
Notice that a learnheuristic solution approach is based upon an iterative framework. In each iteration of a
learnheuristic, the optimisation module (BRH in this case) is used to generate a new candidate solution.
Following this, a simulation model is used to test that solution in order to reveal the average reward attained,
its level of reliability, and also to provide new bonus and policy observations. Afterwards, the best overall
solution is updated if a new best solution has been identified. Then, the new bonus and penalty observations
are used to update the estimates of the bonuses and penalties associated with each node. These steps are
repeated for a specified amount of time or number of iterations. Given new observations of the unknown
and stochastic bonus and penalty values for a node i, denoted ¯
biand ¯pi, the best estimates for the bonus
and penalty associated with node iare updated according to ˆ
biˆ
biNb
i+¯
bi
Nb
i+1,Nb
iNb
i+1, ˆpiˆpiNp
i+¯pi
Np
i+1and
Np
iNp
i+1. Here, Nb
iis the number of observations of bonuses attained from visits to node i,Np
iis
the number of observations of penalties attained from visiting node ilast in a route. In Section 6 we
demonstrate that this simple learning mechanism is effective and allows our BRLH algorithm generate
good solutions which take advantage of the dynamic rewards, thus proving the concept of learnheuristics.
6 COMPUTATIONAL EXPERIMENTS
Our BRLH algorithm was implemented in Java code and run on a personal computer with 8 GB of
RAM and an Intel Core i7 at 1.8 GHz. The set of instances that we employ to test our approach is a
natural extension of the classical benchmark instances for the static TOP proposed by Chao et al. (1996),
which are available at https://www.mech.kuleuven.be/en/cib/op/instances. Each instance involves a fleet
size, number of nodes, customer profits, and maximum route duration tmax. The stochastic travel times
for each edge of these deterministic instances follow a log-normal distribution with the deterministic
travel time as the mean, and a variance of 0.05 times this mean travel time. The mean and standard
deviation parameters of the log transformed edge traversal times are recovered from the following equations:
Bayliss, Copado, Panadero, and Juan
Optimization module
Simulation module
Learning module
Start
iteration=0,
bestSol=null,
bestObj=-inf
iteration=iteration+1
iteration<maxIterations
End
Use BR to generate a new candidate
solution (newSol) using bonus and
penalty predictions
Test newSol to reveal: average
reward, reliability and new
observations of bonuses and
penalties
Use new bonus and penalty
observations to update estimated
bonuses and penalties
newObj>bestObj
Yes
No
bestSol=newSol
bestObj=newObj
Figure 2: Flowchart representation of the proposed BRLH solution approach.
µi j =ln(E[Tij ])1
2ln1+Var[Tij ]
E[Tij ]2and σi j =rln1+Var[Tij ]
E[Tij ]2
. The bonus values gained from the nodes
visited first in routes and penalties applied for nodes visited last in routes both follow a triangular distribution
with minimum, mode, and maximum parameters generated by: (i) selecting and sorting three uniformly
distributed random numbers in the interval [0,1]; and (ii) multiplying them by the base reward of the given
node. We begin by analysing the impact of the choice of βparameter of the geometric distribution used
in the BRLH algorithm and its effect on the average reward of the solutions generated by that algorithm.
Figure 3 shows that low values of βhave the effect of increasing the diversity of the solutions generated
by the BRLH algorithm, whilst higher values decrease diversity. Intermediate values of β, such as 0.3,
achieve an ideal trade off between diversity and exploration, achieving the highest average reward solution
as well as a diverse array of other solutions.
We now consider the trade off between average reward and reliability. A solution is considered to
be non-dominated if no other solutions have both a higher reliability or higher average reward. Then,
for instance p.2.2.b, the set of non-dominated solutions generated from the BRLH algorithm has been
computed using β=0.1 and β=1. Figure 4 shows that the βrandomisation parameter of the proposed
BRLH algorithm is useful for generating solutions with a wide variety of characteristics. It was found that
for a low βvalue it was possible to generate solutions capturing all aspects of the trade off between reward
and reliability, including solution with very high reliability but low average reward and solutions with high
average reward but low reliability. Using a value of β=1 leads to a non-dominated set of solutions that
lie within a very narrow range of average reward and reliability values.
Table 1 shows the performance of our proposed BRLH algorithm, with and without learning, in terms
of average reward, reliability, and solution speed –expressed in terms of the number of iterations of the
BRLH that can be performed within 10 seconds of computing time. For a sample of 8 instances converted
Bayliss, Copado, Panadero, and Juan
Figure 3: The effect of βon average reward and solution diversity.
Figure 4: Non-dominated solutions generated by BRLH using β=0.1 and β=1.
Bayliss, Copado, Panadero, and Juan
Table 1: Comparison of the BRLH with and without learning for standard TOP instances.
With learning Without learning
average Iterations in average Iterations in
Instance reward reliability 10 seconds reward reliability 10 seconds
p1.4.j 98.36 0.19 2550 90.62 0.15 4648
p2.2.c 142.98 0.56 5937 131.11 0.41 6281
p2.2.g 224.63 0.37 3349 210.38 0.88 3961
p2.2.i 252.32 0.47 3368 238.73 0.63 3674
p2.3.e 146.76 0.40 4469 139.24 0.48 4878
p3.4.h 264.36 0.20 1765 236.84 0.32 3490
p3.4.e 170.12 0.33 3184 156.64 0.47 4746
p3.4.f 226.32 0.38 2226 200.66 0.58 3807
average 190.73 0.36 3356.00 175.53 0.49 4435.63
into stochastic instances with dynamic rewards, our proposed BRLH algorithm is able to successfully learn
and exploit information about bonuses and penalties that apply for nodes in routes that are joined by an
edge to a depot node. In both cases, the best average reward solutions require a compromise with regard
to the reliability of the solutions. The results for the number of iterations that can be performed within 10
seconds reflect the fact that marginally more computing time is required when learning is used. This makes
sense since the savings lists have to recalculated after each merge operation, when estimated bonus and
penalty values are taken into account within the BRLH. Finally, we analyse the simultaneous learning and
Figure 5: Average reward and error over the course of the BRLH algorithm.
optimisation which occurs in our BRLH algorithm. Figure 5 shows how the accuracy of the predicted values
for the bonus and penalty values associated with each node improves as more iterations of the proposed
BRLH algorithm are performed. The error values reported are root mean squared errors calculated by
comparing the estimated parameter values with the actual mean values of the bonuses and penalties, which
are never visible to the BRLH algorithm. The BRLH algorithm also displays a weak positive correlation
Bayliss, Copado, Panadero, and Juan
between iteration number and average reward. This suggests that the information learned by the simple
learning mechanism is being used effectively.
7 CONCLUSIONS AND FUTURE WORK
In this paper, we have considered a stochastic team orienteering problem with dynamic rewards. We have
demonstrated how a simple learning mechanism can be incorporated in a biased-randomisation algorithm
to exploit information previously learned about the unknown bonus and penalty process from simulation
tests of candidate solutions. Our experimental results demonstrated how the randomisation parameter of
a biased randomisation algorithm is useful for ensuring that dynamic reward information is learned about
all nodes as well as being useful for generating a wide variety of solutions which capture the trade off
between average reward and reliability.
In future work, we plan to increase the complexity of the process used to generate dynamic rewards,
thus motivating the consideration of more complex learning algorithms, such as neural networks. For
instance, bonuses and rewards can apply for visited node be dependent upon the exact time at which that
node is visited, or which node was visited immediately prior to the given node.
REFERENCES
Arnau, Q., A. A. Juan, and I. Serra. 2018. “On the use of learnheuristics in vehicle routing optimization
problems with dynamic inputs”. Algorithms 11 (12): 208.
Calvet, L., J. de Armas, D. Masip, and A. A. Juan. 2017. “Learnheuristics: hybridizing metaheuristics with
machine learning for optimization with dynamic inputs”. Open Mathematics 15 (1): 261–280.
Calvet, L., A. Ferrer, M. I. Gomes, A. A. Juan, and D. Masip. 2016. “Combining statistical learning with
metaheuristics for the multi-depot vehicle routing problem with market segmentation”. Computers &
Industrial Engineering 94:93–104.
Chao, I.-M., B. L. Golden, and E. A. Wasil. 1996. “The team orienteering problem”. European Journal of
Operational Research 88 (3): 464–474.
Dang, D.-C., R. N. Guibadj, and A. Moukrim. 2013. “An effective PSO-inspired algorithm for the team
orienteering problem”. European Journal of Operational Research 229 (2): 332–344.
De Armas, J., A. A. Juan, J. M. Marqu`
es, and J. P. Pedroso. 2017. “Solving the Deterministic and
Stochastic Uncapacitated Facility Location Problem: from a Heuristic to a Simheuristic”. Journal of
the Operational Research Society 68 (10): 1161–1176.
Dominguez, O., D. Guimarans, A. A. Juan, and I. de la Nuez. 2016. “A biased-randomised large neigh-
bourhood search for the two-dimensional vehicle routing problem with backhauls”. European Journal
of Operational Research 255 (2): 442–462.
Faulin, J., and A. A. Juan. 2008. “The ALGACEA-1 method for the capacitated vehicle routing problem”.
International Transactions in Operational Research 15 (5): 599–621.
Ferreira, J., A. Quintas, and J. Oliveira. 2014. “Solving the team orienteering problem: developing a solution
tool using a genetic algorithm approach”. In Soft computing in industrial applications. Advances in
Intelligent Systems and Computing: 223, 365–375. Springer.
Golden, B., L. Levy, and R. Vohra. 1987. “The orienteering problem”. Naval Research Logistics 34:307–318.
Gonzalez-Neira, E. M., D. Ferone, S. Hatami, and A. A. Juan. 2017. “A biased-randomized simheuristic for
the distributed assembly permutation flowshop problem with stochastic processing times”. Simulation
Modelling Practice and Theory 79:23–36.
Grasas, A., A. A. Juan, J. Faulin, J. de Armas, and H. Ramalhinho. 2017. “Biased Randomization of
Heuristics Using Skewed Probability Distributions: a Survey and Some Applications”. Computers &
Industrial Engineering 110:216–228.
Gunawan, A., H. C. Lau, and P. Vansteenwegen. 2016. “Orienteering problem: A survey of recent variants,
solution approaches and applications”. European Journal of Operational Research 255 (2): 315–332.
Bayliss, Copado, Panadero, and Juan
Ilhan, T., S. Iravani, and M. Daskin. 2008. “The orienteering problem with stochastic profits”. IIE Trans-
actions 40:406–421.
Juan, A. A., W. D. Kelton, C. S. Currie, and J. Faulin. 2018. “Simheuristics applications: dealing with
uncertainty in logistics, transportation, and other supply chain areas”. In Proceedings of the 2018 Winter
Simulation Conference, 3048–3059. IEEE.
Ke, L., L. Zhai, J. Li, and F. Chan. 2015. “Pareto mimic algorithm: an approach to the team orienteering
problem”. Omega 61:155–166.
Keshtkaran, M., K. Ziarati, A. Bettinelli, and D. Vigo. 2015. “Enhanced exact solution methods for the
team orienteering problem”. International Journal of Production Research 54:591–601.
Lin, S. 2013. “Solving the team orienteering problem using effective multi-start simulated annealing”.
Applied Soft Computing 13:1064–1073.
Lin, S.-W., and V. F. Yu. 2012. A simulated annealing heuristic for the team orienteering problem with
time windows”. European Journal of Operational Research 217 (1): 94 107.
Panadero, J., J. de Armas, C. S. Currie, and A. A. Juan. 2017. A simheuristic approach for the stochastic
team orienteering problem”. In Proceedings of the 2017 Winter Simulation Conference, 3208–3217.
IEEE.
Reyes-Rubiano, L. S., C. F. Ospina-Trujillo, J. Faulin, J. M. Mozos, J. Panadero, and A. A. Juan. 2018. “The
team orienteering problem with stochastic service times and driving-range limitations: a simheuristic
approach”. In Proceedings of the 2018 Winter Simulation Conference, 3025–3035. IEEE.
Souffriau, W., P. Vansteenwegen, J. Vertommen, G. V. Berghe, and D. V. Oudheusden. 2008, November.
A Personalized Tourist Trip Design Algorithm For Mobile Tourist Guides”. Appl. Artif. Intell. 22 (10):
964–985.
Verbeeck, C., K. Srensen, E.-H. Aghezzaf, and P. Vansteenwegen. 2014. A fast solution method for the
time-dependent orienteering problem”. European Journal of Operational Research 236 (2): 419 432.
Yu, V. F., P. Jewpanya, C.-J. Ting, and A. P. Redi. 2017. “Two-level particle swarm optimization for the
multi-modal team orienteering problem with time windows”. Applied Soft Computing 61:1022 1040.
AUTHOR BIOGRAPHIES
CHRISTOPHER BAYLISS is a post-doctoral researcher in the ICSO group at the IN3 Universitat
Oberta de Catalunya. His main research interests include metaheuristics, simulation optimisation, rev-
enue management, packing problems, airline scheduling, and logistics optimisation. His email address is
cbayliss@uoc.edu.
PEDRO COPADO-MENDEZ is a post-doctoral researcher in the ICSO group at the IN3 Universi-
tat Oberta de Catalunya. He completed his PhD at the University Rovira i Virgili (Spain). His main
research interests include metaheuristics, hybrid heuristics and their applications. His email address is
pcopadom@uoc.edu.
JAVIER PANADERO is an assistant professor of Simulation and High Performance Computing in the
Computer Science Department at the Universitat Oberta de Catalunya (Spain). He holds a PhD in Computer
Science. His major research areas are: high-performance computing, modelling and analysis of parallel
applications, and simheuristics. His website address is http://www.javierpanadero.com and his email address
is jpanaderom@uoc.edu.
ANGEL A. JUAN is a full professor of Operations Research & Industrial Engineering in the Computer
Science Dept. at the Universitat Oberta de Catalunya (Barcelona, Spain). His main research interests
include applications of simheuristics and learnheuristics in computational logistics and finance. His website
address is http://ajuanp.wordpress.com and his email address is ajuanp@uoc.edu.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Freight transportation is becoming an increasingly critical activity for enterprises in a global world. Moreover, the distribution activities have a non-negligible impact on the environment, as well as on the citizens’ welfare. The classical vehicle routing problem (VRP) aims at designing routes that minimize the cost of serving customers using a given set of capacitated vehicles. Some VRP variants consider traveling times, either in the objective function (e.g., including the goal of minimizing total traveling time or designing balanced routes) or as constraints (e.g., the setting of time windows or a maximum time per route). Typically, the traveling time between two customers or between one customer and the depot is assumed to be both known in advance and static. However, in real life, there are plenty of factors (predictable or not) that may affect these traveling times, e.g., traffic jams, accidents, road works, or even the weather. In this work, we analyze the VRP with dynamic traveling times. Our work assumes not only that these inputs are dynamic in nature, but also that they are a function of the structure of the emerging routing plan. In other words, these traveling times need to be dynamically re-evaluated as the solution is being constructed. In order to solve this dynamic optimization problem, a learnheuristic-based approach is proposed. Our approach integrates statistical learning techniques within a metaheuristic framework. A number of computational experiments are carried out in order to illustrate our approach and discuss its effectiveness.
Article
Full-text available
This paper reviews the existing literature on the combination of metaheuristics with machine learning methods and then introduces the concept of learnheuristics, a novel type of hybrid algorithms. Learnheuristics can be used to solve combinatorial optimization problems with dynamic inputs (COPDIs). In these COPDIs, the problem inputs (elements either located in the objective function or in the constraints set) are not fixed in advance as usual. On the contrary, they might vary in a predictable (non-random) way as the solution is partially built according to some heuristic-based iterative process. For instance, a consumer’s willingness to spend on a specific product might change as the availability of this product decreases and its price rises. Thus, these inputs might take different values depending on the current solution configuration. These variations in the inputs might require from a coordination between the learning mechanism and the metaheuristic algorithm: at each iteration, the learning method updates the inputs model used by the metaheuristic.
Article
Full-text available
Orienteering is a sport in which start and end points are specified along with other locations. These other locations have associated scores. Competitors seek to visit, in a fixed amount of time, a subset of these locations on the way from the start point to the end point in order to maximize the total score. An effective center‐of‐gravity heuristic is presented that outperforms heuristics from the literature.
Conference Paper
p>Optimization problems arising in real-life transportation and logistics need to consider uncertainty conditions (e.g., stochastic travel times, etc.). Simulation is employed in the analysis of complex systems under such non-deterministic environments. However, simulation is not an optimization tool, so it needs to be combined with optimization methods whenever the goal is to: (i) maximize the system performance using limited resources; or (ii) minimize its operations cost while guaranteeing a given quality of service. When the underlying optimization problem is NP-hard, metaheuristics are required to solve large-scale instances in reasonable computing times. Simheuristics extend metaheuristics by adding a simulation layer that allows the optimization component to deal with scenarios under uncertainty. This paper reviews both initial as well as recent applications of simheuristics, mainly in the area of logistics and transportation. The paper also discusses current trends and open research lines in this field.</p
Article
Modern manufacturing systems are composed of several stages. We consider a manufacturing environment in which different parts of a product are completed in a first stage by a set of distributed flowshop lines, and then assembled in a second stage. This is known as the distributed assembly permutation flowshop problem (DAPFSP). This paper studies the stochastic version of the DAPFSP, in which processing and assembly times are random variables. Besides minimizing the expected makespan, we also discuss the need for considering other measures of statistical dispersion in order to account for risk. A hybrid algorithm is proposed for solving this N P-hard and stochastic problem. Our approach integrates biased randomization and simulation techniques inside a metaheuristic framework. A series of computational experiments contribute to illustrate the effectiveness of our approach.
Article
This study presents a new variant of the team orienteering problem with time windows (TOPTW), called the multi-modal team orienteering problem with time windows (MM-TOPTW). The problem is motivated by the development of a tourist trip design application when there are several transportation modes available for tourists to choose during their trip. We develop a mixed integer programming model for MM-TOPTW based on the standard TOPTW model with additional considerations of transportation mode choices, including transportation cost and transportation time. Because MM-TOPTW is NP-hard, we design a two-level particle swarm optimization with multiple social learning terms (2L-GLNPSO) to solve the problem. To demonstrate the applicability and effectiveness of the proposed model and algorithm, we employ the proposed 2L-GLNPSO to solve 56 MM-TOPTW instances that are generated based on VRPTW benchmark instances. The computational results demonstrate that the proposed 2L-GLNPSO can obtain optimal solutions to small and medium-scale instances. For large-scale instances, 2L-GLNPSO is capable of producing high-quality solutions. Moreover, we test the proposed algorithm on standard TOPTW benchmark instances and obtains competitive results with the state-of-art algorithms.
Article
Randomized heuristics are widely used to solve large scale combinatorial optimization problems. Among the plethora of randomized heuristics, this paper reviews those that contain biased-randomized procedures (BRPs). A BRP is a procedure to select the next constructive ‘movement’ from a list of candidates in which their elements have different probabilities based on some criteria (e.g., ranking, priority rule, heuristic value, etc.). The main idea behind biased randomization is the introduction of a slight modification in the greedy constructive behavior that provides a certain degree of randomness while maintaining the logic behind the heuristic. BRPs can be categorized into two main groups according to how choice probabilities are computed: (i) BRPs using an empirical bias function; and (ii) BRPs using a skewed theoretical probability distribution. This paper analyzes the second group and illustrates, throughout a series of numerical experiments, how these BRPs can benefit from parallel computing in order to significantly outperform heuristics and even simple metaheuristic approaches, thus providing reasonably good solutions in ‘real time’ to different problems in the areas of transportation, logistics, and scheduling.
Article
The uncapacitated facility location problem (UFLP) is a popular combinatorial optimization problem with practical applications in different areas, from logistics to telecommunication networks. While most of the existing work in the literature focuses on minimizing total cost for the deterministic version of the problem, some degree of uncertainty (e.g., in the customers’ demands or in the service costs) should be expected in real-life applications. Accordingly, this paper proposes a simheuristic algorithm for solving the stochastic UFLP (SUFLP), where optimization goals other than the minimum expected cost can be considered. The development of this simheuristic is structured in three stages: (i) first, an extremely fast savings-based heuristic is introduced; (ii) next, the heuristic is integrated into a metaheuristic framework, and the resulting algorithm is tested against the optimal values for the UFLP; and (iii) finally, the algorithm is extended by integrating it with simulation techniques, and the resulting simheuristic is employed to solve the SUFLP. Some numerical experiments contribute to illustrate the potential uses of each of these solving methods, depending on the version of the problem (deterministic or stochastic) as well as on whether or not a real-time solution is required.