A preview of this full-text is provided by Springer Nature.
Content available from Annals of Operations Research
This content is subject to copyright. Terms and conditions apply.
Annals of Operations Research
https://doi.org/10.1007/s10479-021-04489-z
ORIGINAL RESEARCH
Global synchromodal shipment matching problem with
dynamic and stochastic travel times: a reinforcement
learning approach
W. Guo1·B. Atasoy2·R. R. Negenborn2
Accepted: 7 December 2021
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
Abstract
Global synchromodal transportation involves the movement of container shipments between
inland terminals located in different continents using ships, barges, trains, trucks, or any com-
bination among them through integrated planning at a network level. One of the challenges
faced by global operators is the matching of accepted shipments with services in an integrated
global synchromodal transport network with dynamic and stochastic travel times. The travel
times of services are unknown and revealed dynamically during the execution of transport
plans, but the stochastic information of travel times are assumed available. Matching deci-
sions can be updated before shipments arrive at their destination terminals. The objective
of the problem is to maximize the total profits that are expressed in terms of a combina-
tion of revenues, travel costs, transfer costs, storage costs, delay costs, and carbon tax over
a given planning horizon. We propose a sequential decision process model to describe the
problem. In order to address the curse of dimensionality, we develop a reinforcement learn-
ing approach to learn the value of matching a shipment with a service through simulations.
Specifically, we adopt the Q-learning algorithm to update value function estimations and
use the -greedy strategy to balance exploitation and exploration. Online decisions are cre-
ated based on the estimated value functions. The performance of the reinforcement learning
approach is evaluated in comparison to a myopic approach that does not consider uncertain-
ties and a stochastic approach that sets chance constraints on feasible transshipment under a
rolling horizon framework.
Keywords Global synchromodal shipment matching ·Dynamic and stochastic travel
times ·Sequential decision process ·Reinforcement learning ·Q-learning
BW. Gu o
guo.wenjing@courrier.uqam.ca
1CIRRELT and Department of Analytics, Operations and Information Technologies, School of
Management Sciences, University of Quebec at Montreal, Montreal, Canada
2Department of Maritime and Transport Technology, Delft University of Technology, Delft, The
Netherlands
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.