BookPDF Available

Data Science for Supply Chain Forecasting

Authors:

Abstract and Figures

Using data science in order to solve a problem requires a scientific mindset more than coding skills. Data Science for Supply Chain Forecasting, Second Edition contends that a true scientific method which includes experimentation, observation, and constant questioning must be applied to supply chains to achieve excellence in demand forecasting. This second edition adds more than 45 percent extra content with four new chapters including an introduction to neural networks and the forecast value added framework. Part I focuses on statistical "traditional" models, Part II, on machine learning, and the all-new Part III discusses demand forecasting process management. The various chapters focus on both forecast models and new concepts such as metrics, underfitting, overfitting, outliers, feature optimization, and external demand drivers. The book is replete with do-it-yourself sections with implementations provided in Python (and Excel for the statistical models) to show the readers how to apply these models themselves. This hands-on book, covering the entire range of forecasting--from the basics all the way to leading-edge models--will benefit supply chain practitioners, forecasters, and analysts looking to go the extra mile with demand forecasting.
Content may be subject to copyright.
A preview of the PDF is not available
... The precise procedure for estimating the trend, seasonal factor, and cycle terms is described by O'Connell et al. (1993). Vandeput (2021) presented the basic idea of simple exponential smoothing based on assumptions that the future will be more or less the same as the past. Among the existing forecasting methods that use exponential smoothing, the most well-known are Holt's linear method (Holt, 1957), Multiplicative Holt-Winters' method (Winters, 1960), and Brown's model (Brown, 1959). ...
... Exponential smoothing is one of the most successful classical forecasting methods. Exponential smoothing is a method of smoothing a time series which has a number of previous hours when forecasting, a computational procedure that includes the processing of all observations, taking into account the aging of information as it moves away from the forecast period (Vandeput, 2021). The principle of operation of this method is a parameter in that the previous values are taken into account with decreasing exponential weights. ...
Article
Purpose. The purpose of our article is to research and forecast prices for agricultural products using the example of potato prices based on the most effective models using data science techniques. Methodology / approach. Various forecasting models are explored, starting from baseline models like decomposition and exponential smoothing models to more advanced techniques such as ARIMA, SARIMA, as well as deep learning models including neural network. The data is split into training and testing sets, and models are validated using cross-validation techniques and optimised through hyperparameter tuning. Model performance is evaluated using metrics such as MAE, MSE, RMSE, and MAPE. The selected model is then used to generate future price forecasts, with uncertainty quantified through confidence intervals. Results. The study successfully applied advanced data science techniques to forecast potato prices, leveraging a range of effective models. By analysing historical price data and using various forecasting methods, the research identified the most accurate models for predicting future price trends. The results demonstrate that the selected models can provide reliable forecasts. In particular, the results showed that the model could achieve good forecast results when applied to real problems and, thus, can be effectively used for forecasting tasks especially considering seasonality. In addition, it should be noted that the model has a higher prediction accuracy at the time intervals closest to the original data. The obtained results support using both models simultaneously for forecasting, which can compensate for the shortcomings of each of them. The models can be used separately, to more accurately predict the values for the required period, or a combination of them is also possible. Originality / scientific novelty. The study’s originality lies in development of methods for effectively accounting for seasonality in agricultural price data, such as using seasonal decomposition techniques or more advanced techniques that combine statistical and data science approaches. The novelty implies the implementation of real-time data processing and forecasting system allows for the timely prediction of price changes, enabling stakeholders to make more informed decisions. Practical value / implications. Forecasting potato prices holds significant practical value for various stakeholders. For farmers, accurate forecasts enable informed decisions on the optimal times to plant, harvest, and sell their crops, thereby optimising their profits. In the supply chain, distributors and retailers can use these forecasts to manage inventory more effectively and plan contracts, reducing waste and avoiding shortages. Policymakers benefit from forecasts by anticipating market fluctuations and stabilising prices, which supports both consumers and producers. For consumers, stable pricing ensures better budgeting and helps avoid sudden price spikes, making essential foods more affordable. Overall, accurate price forecasting enhances market efficiency by reducing uncertainty and aiding investors in managing risk.
... Nicolas Vandeput (2021) demonstruje, jak przy minimalnym wysiłku, stosując narzędzia open source, można stworzyć prognozy zbliżone do najbardziej nowoczesnych rozwiązań rynkowych. Otwiera to drogę małym i średnim przedsiębiorstwom do korzystania ze swoich danych w celach prognozowania i optymalizacji łańcuchów dostaw. ...
Article
Streszczenie: Zarządzanie łańcuchami dostaw w erze współczesnej wymaga stosowa- nia zaawansowanych narzędzi i metodologii, aby sprostać dynamicznym wyzwaniom rynkowym. Celem artykułu jest przedstawienie dwóch modeli referencyjnych w zarzą- dzaniu łańcuchami dostaw: SCOR (Supply Chain Operations Reference) oraz GSCF (Global Supply Chain Forum). Artykuł skupia się na tym, jak wykorzystanie narzędzi big data i data science może wzmocnić te modele, umożliwiając lepsze monitorowanie, optymalizację procesów i reakcję na zmiany rynkowe. Zastosowanie tych metod w rze- czywistych środowiskach biznesowych zostało przedstawione na przykładzie implemen- tacji technologii analizy danych w firmie LOKAD. Wyniki rozważań pokazują, że anali- za danych w czasie rzeczywistym pozwala na precyzyjne prognozowanie popytu, optymalizację zapasów i identyfikację ryzyka. Na poziomie operacyjnym i taktycznym narzędzie big data może być wykorzystywane do optymalizacji tras pojazdów, zarządza- nia flotą, poprawy obsługi klienta i rekomendacji produktów. Na poziomie strategicz- nym big data wspiera projektowanie produktów, planowanie sieci i strategię biznesową. Słowa kluczowe: big data, data science, zarządzanie łańcuchami dostaw, SCOR, GSCF, Supply Chain Scientist. Kod JEL: C8
... In the preface to his book "Principes de la théorie des richesses" (1863), Cournot describes the genesis of his life as an author and confides that his volume "Recherches sur les principes mathématiques de la théorie des richesses" was a pivotal moment in his life, for it was from this point that he decided to move from "being a critic, editor, or translator" to "being an author". 7 Vandeput, N. (2021). Data Science for Supply Chain Forecasting. ...
Article
Full-text available
Findings. (i) We shall provide a price-demand sensitivity model (fare in C$) for ridership demand inside the Toronto city area from 1969 to 2019 as well as the derivation of short- and long-term elasticities. We shall also provide, through modelling and forecasting, insight into future demand trends considering pre- and post-pandemic scenarios. It is clear that one of the most remarkable points to emerge from the transit data in recent times is that the effects of the pandemic crisis have slowed ridership demand by almost half (225 million riders in 2020) compared to the previous trend (525.5 million riders in 2019) in the City of Toronto (TTC operating service area only), causing a structural break in the time series data, also observed in the TTC's latest report (only 197.8 million riders in 2021). This is not an isolated case for the city of Toronto, as Montreal (only 200 million users in 2020 compared to 426 million in 2019) and most of the major metropolitan cities in the world that were affected also experienced the same slowdown in demand due to reduced activity during the pandemic. (ii) Using our estimated market demand model for transit in the city of Toronto, we will attempt to estimate the optimal price and the price at which consumers are no longer willing to pay. And since this study concerns a public transit company in the form of a natural monopoly commonly referred to as a state monopoly, we shall draw on the seminal work and contributions of Frank Ramsey and Marcel Boiteux (Ramsey-Boiteux pricing rule) to understand the optimal pricing behaviour of this type of network infrastructure. The aim of this research will therefore also be to understand, beyond the elements of microeconomic optimization, that the nature of the company is at the center of its interests, which has a considerable influence on certain aspects that determine its business model and in particular its pricing behaviour (Baumol, 1959). Keywords: market demand, sensitivity analysis, time series modeling and forecasting, pricing behaviour. JEL Classification: C22; C63; C87; D12; D22; D42
... This paper utilizes Key Performance Indicators (KPIs) for numerical prediction, referencing the algorithm presented in [60]. Formulas (17) to (24) are employed as indicators for model predictions. ...
... where n is the number of data points, x i is the measured data i, and  x i is its corresponding estimation. For more information on the RMSE, the interested reader is referred to [49]. ...
Article
Full-text available
The efficient integration of distributed energy resources (DERs) in buildings is a challenge that can be addressed through the deployment of multienergy microgrids (MGs). In this context, the Interreg SUDOE project IMPROVEMENT was launched at the end of the year 2019 with the aim of developing efficient solutions allowing public buildings with critical loads to be turned into net-zero-energy buildings (nZEBs). The work presented in this paper deals with the development of a predictive energy management system (PEMS) for the management of thermal resources and users’ thermal comfort in public buildings. Optimization-based/optimization-free model predictive control (MPC) algorithms are presented and validated in simulations using data collected in a public building equipped with a multienergy MG. Models of the thermal MG components were developed. The strategy currently used in the building relies on proportional–integral–derivative (PID) and rule-based (RB) controllers. The interconnection between the thermal part and the electrical part of the building-integrated MG is managed by taking advantage of the solar photovoltaic (PV) power generation surplus. The optimization-based MPC EMS has the best performance but is rather computationally expensive. The optimization-free MPC EMS is slightly less efficient but has a significantly reduced computational cost, making it the best solution for in situ implementation.
... An essential element in the supply chain is logistics. This activity consists of the process of planning and executing transport, the efficient storage of goods and the correct distribution of the products between customers and suppliers [7,8]. In this case, transport is the central problem for the design of the chain; this is where routes, loads, transport typology, operators, travel times, transport costs and time windows have to be defined. ...
Article
Full-text available
In this research, we develop an extension of the stochastic routing model with a fixed capacity for the distribution of perishable products with a time window. We use theoretical probability distributions to model the life of transported products and travel times in the network. Our main objective is to maximize the probability of delivering products within the established deadline with a certain level of customer service. Our project is justified from the perspective of reducing the pollution caused by greenhouse gases generated in the process. To optimize the proposed model, we use a Generic Random Search Algorithm. Finally, we apply the idea to a real problem of designing strategies for the optimal management of perishable food distribution routes that involve a time window, the objective being to maximize the probability of meeting the time limit assigned to the route problem by reducing, in this way, the pollution generated by refrigerated transport.
... Dengan demikian, semakin kecil nilai MAPE maka semakin baik hasil peramalannya. Berikut ini meruapakn persamaan dari MAPE (Vandeput, 2021): ...
Article
Peramalan adalah proses membuat prediksi masa depan berdasarkan data masa lalu dan data masa sekarang. Double Exponential Smoothing Brown (DES Brown) merupakan sebuah Linear Model yang dikembangkan oleh Robert Goodell Brown pada tahun 1956. Nilai parameter alpha yang baik adalah yang menghasilkan nilai error kecil. Evolutionary Algorithm (EA) merupakan sebuah metaheuristic yang dapat diaplikasikan pada berbagai permasalahan optimasi, termasuk mengoptimasi nilai parameter alpha pada DES Brown. Solver merupakan Excel Add-In untuk menyelesaikan permasalahan optimasi. Dengan adanya kemampuan EA untuk mengoptimasi nilai parameter alpha pada DES Brown, dan Solver yang mampu menjalankan EA, maka dapat disimpulkan bahwa penentuan parameter alpha pada DES Brown dapat dilakukan dengan memanfaatkan Solver. Kasus “Peramalan Penerimaan Pajak Pemerintah Pusat Indonesia” dapat dijadikan sebagai tempat untuk menguji kemampuan DES Brown, EA dan Solver. Untuk kasus tersebut, berdasarkan hasil penyelesaian dengan DES Brown, EA dan Solver, nilai alpha terbaiknya adalah 0,365904469725856 dengan nilai MAPE sebesar 4,41881238122180%.
Article
Full-text available
The COVID-19 disease has wreaked havoc on communities all around the world. The capacity to forecast the pandemic growth has been crucial in making decisions on how to combat it and control its spread in current and future waves of infections. Different mathematical models have recently been employed to anticipate the COVID-19 outbreak globally. We analyzed the COVID-19 confirmed, recovered, and deceased cases in India through time series analysis to comprehend the growth patterns of these cases. When the COVID-19 data transitions from one stage to the next, a structural break occurs. We analyzed COVID-19 incidences from Jan 31, 2020, to Jan 30, 2022, having three waves of Alpha, Delta, and Omicron variants of SARS-CoV-2 using time series regression and machine learning models in India for all 28 states (provinces) and 8 union territories (directly federally governed territories). People should follow the suppressive measures as suggested by the government to break the infection chain. The analysis could aid in the development of more effective intervention programs and the mitigation of the pandemic's current and future waves.
Article
Full-text available
The present study is aimed at investigating the relationship between two variables of temperature and precipitation with vegetation dynamics in one of the arid and semi-arid regions of the world, i.e., Baluchistan in Southwestern Asia, which is shared by the three countries of Iran, Pakistan and Afghanistan. In order to achieve the objectives, two different databases were used: 1. MODIS NDVI 16-day composite products (MOD13A3) of Terra satellite, with 1*1 km spatial resolution, which was obtained for a 17-year period (2000-2016) from the Earth Observing System (EOS) Data Gateway of the National Aeronautics and Space Administration (NASA); 2. Gridded monthly temperature and precipitation data was obtained for the same 17-year period from the Climate Research Unit (CRU) of the University of East Anglia. The Pearson product-moment correlation coefficient was also used to examine the relationship between vegetation dynamics and two climate variables of temperature and precipitation simultaneously as well as in three time lags, i.e.; one month, two months and three months. The results of the analysis of a correlation MAUSAM, 75, 1 (January 2024), 73-88 DOI : https://doi.org/10.54302/mausam.v75i1.3573 Homepage: https://mausamjournal.imd.gov.in/index.php/MAUSAM MAUSAM, 75, 1 (January 2024) 74 between the mean temperature and monthly NDVI in different time lags indicated that in the humid and semi-humid regions in the northern half of Baluchistan, NDVI simultaneously reacted to temperature variations, while in the arid and semi-arid regions in the southern half of Baluchistan, NDVI had a one-month time lag with temperature. However, the results of the analysis of a correlation between precipitation and monthly NDVI in different time lags indicated that NDVI simultaneously reacted to precipitation variations, that is precipitation of each month had the greatest effect on the NDVI of the same month.
Book
Full-text available
Inventory Optimization argues that mathematical inventory models can only take us so far with supply chain management. In order to optimize inventory policies, we have to use probabilistic simulations. The book explains how to implement these models and simulations step-by-step, starting from simple deterministic ones to complex multi-echelon optimization. The first two parts of the book discuss classical mathematical models, their limitations and assumptions, and a quick but effective introduction to Python is provided. Part 3 contains more advanced models that will allow you to optimize your profits, estimate your lost sales and use advanced demand distributions. It also provides an explanation of how you can optimize a multi-echelon supply chain based on a simple--yet powerful--framework. Part 4 discusses inventory optimization thanks to simulations under custom discrete demand probability functions. Inventory managers, demand planners and academics interested in gaining cost-effective solutions will benefit from the do-it-yourself examples and Python programs included in each chapter.
Chapter
We continue the study of density estimation, but now from the nonparametric point of view.
Article
Recurrent Neural Networks (RNNs) have become competitive forecasting methods, as most notably shown in the winning method of the recent M4 competition. However, established statistical models such as exponential smoothing (ETS) and the autoregressive integrated moving average (ARIMA) gain their popularity not only from their high accuracy, but also because they are suitable for non-expert users in that they are robust, efficient, and automatic. In these areas, RNNs have still a long way to go. We present an extensive empirical study and an open-source software framework of existing RNN architectures for forecasting, and we develop guidelines and best practices for their use. For example, we conclude that RNNs are capable of modelling seasonality directly if the series in the dataset possess homogeneous seasonal patterns; otherwise, we recommend a deseasonalisation step. Comparisons against ETS and ARIMA demonstrate that (semi-) automatic RNN models are not silver bullets, but they are nevertheless competitive alternatives in many situations.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.