Article

Why Didn’t Experts Pick M4-Competition Winner?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Purpose: Commentary on M4-Competition and findings to assess the contribution of data models—such as from machine learning methods—to improving forecast accuracy. Methods: (1) Use prior knowledge on the relative accuracy of forecasts from validated forecasting methods to assess the M4 findings. (2) Use prior knowledge on forecasting principles and the scientific method to assess whether data models can be expected to improve accuracy relative to forecasts from previously validated methods under any conditions. Findings: Prior knowledge from experimental research is supported by the M4 findings that simple validated methods provided forecasts that are: (1) typically more accurate than those from complex and costly methods; (2) considerably more accurate than those from data models. Limitations: Conclusions were limited by incomplete hypotheses from prior knowledge such as would have permitted experimental tests of which methods, and which individual models, would be most accurate under which conditions. Implications: Data models should not be used for forecasting under any conditions. Forecasters interested in situations where much relevant data are available should use knowledge models.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, the practical implications of forecasting competitions have been widely criticized by the forecasting community and especially its practitioners, claiming that the findings reported may depend on the particularities of the data set used for conducting the competition, thus being difficult to generalize and exploit in business, real-life applications (Ord, 2001;Clements & Hendry, 2001;Armstrong & Green, 2019;Darin & Stellwagen, 2019;Fry & Brundage, 2019;Bojer & Meldgaard, 2021). For example, M5 focused on the sales of ten indicative US stores of a global retail firm located in three states (California, Wisconsin, and Texas), covering the period from 2011 to 2016 and three product categories ("Foods", "Household", and "Hobbies"). ...
Article
Full-text available
The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field to identify best practices and highlight their practical implications. However, can the findings of the M5 competition be generalized and exploited by retail firms to better support their decisions and operation? This depends on the extent to which M5 data is sufficiently similar to unit sales data of retailers operating in different regions selling different product types and considering different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporación Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest only minor discrepancies between the examined data sets, supporting the representativeness of the M5 data.
... However, the practical implications of forecasting competitions have been widely criticized by the forecasting community and especially its practitioners, claiming that the findings reported may depend on the particularities of the data set used for conducting the competition, thus being difficult to generalize and exploit in business, real-life applications (Ord, 2001;Clements & Hendry, 2001;Armstrong & Green, 2019;Darin & Stellwagen, 2019;Fry & Brundage, 2019;Bojer & Meldgaard, 2021). For example, M5 focused on the sales of ten indicative US stores of a global retail firm located in three states (California, Wisconsin, and Texas), covering the period from 2011 to 2016 and three product categories ("Foods", "Household", and "Hobbies"). ...
Preprint
The main objective of the M5 competition, which focused on forecasting the hierarchical unit sales of Walmart, was to evaluate the accuracy and uncertainty of forecasting methods in the field in order to identify best practices and highlight their practical implications. However, whether the findings of the M5 competition can be generalized and exploited by retail firms to better support their decisions and operation depends on the extent to which the M5 data is representative of the reality, i.e., sufficiently represent the unit sales data of retailers that operate in different regions, sell different types of products, and consider different marketing strategies. To answer this question, we analyze the characteristics of the M5 time series and compare them with those of two grocery retailers, namely Corporaci\'on Favorita and a major Greek supermarket chain, using feature spaces. Our results suggest that there are only small discrepancies between the examined data sets, supporting the representativeness of the M5 data.
Article
Full-text available
In marketing, consumer behavior is a crucial factor in the placement of products in the market and is often the subject of study and research by large companies to identify the needs of citizens and their behavior as consumers in the buying decision process. Consumer buying behavior refers to the buying behavior of final consumers—individuals and households that buy goods and services for personal consumption (Kotler et al., 1999). A company that truly understands how consumers will respond to different product features, pricing, and advertising appeals has a significant advantage over its competitors. The factors that influence consumer behavior are the key elements that companies analyze and aim to break down in order to "attract" customers. This paper will examine the factors of consumer behavior and their impact on increasing/decreasing imports in trade with several countries with which Kosovo has international trade relations. The phenomenon of ethnocentrism will also be examined, a phenomenon that has emerged in every nation in recent years and is more pronounced in the Republic of Kosovo. Finally, an empirical analysis will be presented, highlighting the relationship between imports and import prices.
Chapter
The proliferation of business data and on-demand computing have propelled the use of artificial intelligence methods in quantitative forecasting. Machine learning has a prominent role in solving clustering and classification problems as well as dimensionality reduction. Nevertheless, traditional statistical methods of forecasting continue to perform well in competitions and many practical applications. The chapter considers critically the successes of machine learning in forecasting, using some case studies as well as theoretical considerations, including limitations on machine learning and other techniques for forecasting. It also discusses weaknesses of the Vapnik–Chervonenkis theory. The main aim of the chapter is to stimulate scholarly dialogue on the role of machine learning in forecasting.
Article
Full-text available
The authors discuss their development of JalTantra, a web system that aids government engineers in India in designing piped water systems that provide an adequate quality of service to meet citizens’ needs for drinking water at a cost below that mandated by strict government norms.
Article
Full-text available
The success of the United States Air Force Academy (USAFA) in graduating effective operations research (OR) practitioners, as supported by its award of the 2017 INFORMS UPS George D. Smith Prize, rests upon a close relationship with the primary customer that graduates serve. The USAFA mission is “To educate, train and inspire men and women to become officers of character motivated to lead the United States Air Force in service to our Nation.” Although the Air Force is USAFA’s primary customer, the Air Force Analytic Community (AFAC), which oversees a corps of approximately 1,100 military and civilian analysts stationed across the globe, is the USAFA OR program’s primary customer. Although not all USAFA OR graduates become OR analysts in the Air Force and not all OR analysts in the Air Force are USAFA OR graduates, the focus of the program is to produce high-performing OR analysts for the AFAC. This paper describes the four practical components of the USAFA OR program, which explain how the program has been tailored to meet the needs of the AFAC by producing good practitioners of OR for the Air Force. We conclude by explaining how these components may be generalized for a typical OR program at a college or university. © 2018 INFORMS Inst.for Operations Res.and the Management Sciences. All Rights Reserved.
Article
Full-text available
Problem Do conservative econometric models that comply with the Golden Rule of Forecasting provide more accurate forecasts? Methods To test the effects of forecast accuracy, we applied three evidence-based guidelines to 19 published regression models used for forecasting 154 elections in Australia, Canada, Italy, Japan, Netherlands, Portugal, Spain, Turkey, U.K., and the U.S. The guidelines direct forecasters using causal models to be conservative to account for uncertainty by (I) modifying effect estimates to reflect uncertainty either by damping coefficients towards no effect or equalizing coefficients, (II) combining forecasts from diverse models, and (III) incorporating more knowledge by including more variables with known important effects. Findings Modifying the econometric models to make them more conservative reduced forecast errors compared to forecasts from the original models: (I) Damping coefficients by 10% reduced error by 2% on average, although further damping generally harmed accuracy; modifying coefficients by equalizing coefficients consistently reduced errors with average error reductions between 2% and 8% depending on the level of equalizing. Averaging the original regression model forecast with an equal-weights model forecast reduced error by 7%. (II) Combining forecasts from two Australian models and from eight U.S. models reduced error by 14% and 36%, respectively. (III) Using more knowledge by including all six unique variables from the Australian models and all 24 unique variables from the U.S. models in equal-weight “knowledge models” reduced error by 10% and 43%, respectively. Originality This paper provides the first test of applying guidelines for conservative forecasting to established election forecasting models. Usefulness Election forecasters can substantially improve the accuracy of forecasts from econometric models by following simple guidelines for conservative forecasting. Decision-makers can make better decisions when they are provided with models that are more realistic and forecasts that are more accurate.
Article
Full-text available
The M4 competition is the continuation of three previous competitions started more than 45 years ago whose purpose was to learn how to improve forecasting accuracy, and how such learning can be applied to advance the theory and practice of forecasting. The purpose of M4 was to replicate the results of the previous ones and extend them into three directions: First significantly increase the number of series, second include Machine Learning (ML) forecasting methods, and third evaluate both point forecasts and prediction intervals. The five major findings of the M4 Competitions are: 1. Out Of the 17 most accurate methods, 12 were “combinations” of mostly statistical approaches. 2. The biggest surprise was a “hybrid” approach that utilized both statistical and ML features. This method's average sMAPE was close to 10% more accurate than the combination benchmark used to compare the submitted methods. 3. The second most accurate method was a combination of seven statistical methods and one ML one, with the weights for the averaging being calculated by a ML algorithm that was trained to minimize the forecasting. 4. The two most accurate methods also achieved an amazing success in specifying the 95% prediction intervals correctly. 5. The six pure ML methods performed poorly, with none of them being more accurate than the combination benchmark and only one being more accurate than Naïve2. This paper presents some initial results of M4, its major findings and a logical conclusion. Finally, it outlines what the authors consider to be the way forward for the field of forecasting.
Article
Full-text available
Problem How to help practitioners, academics, and decision makers use experimental research findings to substantially reduce forecast errors for all types of forecasting problems. Methods Findings from our review of forecasting experiments were used to identify methods and principles that lead to accurate forecasts. Cited authors were contacted to verify that summaries of their research were correct. Checklists to help forecasters and their clients undertake and commission studies that adhere to principles and use valid methods were developed. Leading researchers were asked to identify errors of omission or commission in the analyses and summaries of research findings. Findings Forecast accuracy can be improved by using one of 15 relatively simple evidence-based forecasting methods. One of those methods, knowledge models, provides substantial improvements in accuracy when causal knowledge is good. On the other hand, data models – developed using multiple regression, data mining, neural nets, and “big data analytics” – are unsuited for forecasting. Originality Three new checklists for choosing validated methods, developing knowledge models, and assessing uncertainty are presented. A fourth checklist, based on the Golden Rule of Forecasting, was improved. Usefulness Combining forecasts within individual methods and across different methods can reduce forecast errors by as much as 50%. Forecasts errors from currently used methods can be reduced by increasing their compliance with the principles of conservatism (Golden Rule of Forecasting) and simplicity (Occam’s Razor). Clients and other interested parties can use the checklists to determine whether forecasts were derived using evidence-based procedures and can, therefore, be trusted for making decisions. Scientists can use the checklists to devise tests of the predictive validity of their findings.
Article
Full-text available
Many accuracy measures have been proposed in the past for time series forecasting comparisons. However, many of these measures suffer from one or more issues such as poor resistance to outliers and scale dependence. In this paper, while summarising commonly used accuracy measures, a special review is made on the symmetric mean absolute percentage error. Moreover, a new accuracy measure called the Unscaled Mean Bounded Relative Absolute Error (UMBRAE), which combines the best features of various alternative measures, is proposed to address the common issues of existing measures. A comparative evaluation on the proposed and related measures has been made with both synthetic and real-world data. The results indicate that the proposed measure, with user selectable benchmark, performs as well as or better than other measures on selected criteria. Though it has been commonly accepted that there is no single best accuracy measure, we suggest that UMBRAE could be a good choice to evaluate forecasting methods, especially for cases where measures based on geometric mean of relative errors, such as the geometric mean relative absolute error, are preferred.
Article
Full-text available
Problem: The scientific method is unrivaled for generating useful knowledge, yet papers published in scientific journals frequently violate the scientific method. Methods: A definition of the scientific method was developed from the writings of pioneers of the scientific method including Aristotle, Newton, and Franklin. The definition was used as the basis of a checklist of eight criteria necessary for compliance with the scientific method. The extent to which research papers follow the scientific method was assessed by reviewing the literature on the practices of researchers whose papers are published in scientific journals. Findings of the review were used to develop an evidence-based checklist of 20 operational guidelines to help researchers comply with the scientific method. Findings: The natural desire to have one’s beliefs and hypotheses confirmed can tempt funders to pay for supportive research and researchers to violate scientific principles. As a result, advocacy has come to dominate publications in scientific journals, and had led funders, universities, and journals to evaluate researchers’ work using criteria that are unrelated to the discovery of useful scientific findings. The current procedure for mandatory journal review has led to censorship of useful scientific findings. We suggest alternatives, such as accepting all papers that conform with the eight critera of the scientific method. Originality: This paper provides the first comprehensive and operational evidence-based checklists for assessing compliance with the scientific method and for guiding researchers on how to comply. Usefulness: The “Criteria for Compliance with the Scientific Method” checklist could be used by journals to certify papers. Funders could insist that research projects comply with the scientific method. Universities and research institutes could hire and promote researchers whose research complies. Courts could use it to assess the quality of evidence. Governments could base policies on evidence from papers that comply, and citizens could use the checklist to evaluate evidence on public policy. Finally, scientists could ensure that their own research complies with science by designing their projects using the “Guidelines for Scientists” checklist. Keywords: advocacy; checklists; data models; experiment; incentives; knowledge models; multiple reasonable hypotheses; objectivity; regression analysis; regulation; replication; statistical significance
Article
Full-text available
This article examines whether decomposing time series data into two parts - level and change - produces forecasts that are more accurate than those from forecasting the aggregate directly. Prior research found that, in general, decomposition reduced forecasting errors by 35%. An earlier study on decomposition into level and change found a forecast error reduction of 23%. The current study found that nowcasts consisting of a simple average of estimates from preliminary surveys and econometric models of the U.S. lodging market, improved the accuracy of final estimates of levels. Forecasts of change from an econometric model and the improved nowcasts reduced forecast errors by 29% when compared to direct forecasts of the aggregate. Forecasts of change from an extrapolation model and the improved nowcasts reduced forecast errors by 45%. On average then, the error reduction for this study was 37%.
Article
Full-text available
This article introduces this JBR Special Issue on simple versus complex methods in forecasting. Simplicity in forecasting requires that (1) method, (2) representation of cumulative knowledge, (3) relationships in models, and (4) relationships among models, forecasts, and decisions are all sufficiently uncomplicated as to be easily understood by decision-makers. Our review of studies comparing simple and complex methods - including those in this special issue - found 97 comparisons in 32 papers. None of the papers provide a balance of evidence that complexity improves forecast accuracy. Complexity increases forecast error by 27 percent on average in the 25 papers with quantitative comparisons. The finding is consistent with prior research to identify valid forecasting methods: all 22 previously identified evidence-based forecasting procedures are simple. Nevertheless, complexity remains popular among researchers, forecasters, and clients. Some evidence suggests that the popularity of complexity may be due to incentives: (1) researchers are rewarded for publishing in highly ranked journals, which favor complexity; (2) forecasters can use complex methods to provide forecasts that support decision-makers’ plans; and (3) forecasters’ clients may be reassured by incomprehensibility. Clients who prefer accuracy should accept forecasts only from simple evidence-based procedures. They can rate the simplicity of forecasters’ procedures using the questionnaire at simple-forecasting.com.
Article
Full-text available
This article proposes a unifying theory, or the Golden Rule, of forecasting. The Golden Rule of Forecasting is to be conservative. A conservative forecast is consistent with cumulative knowledge about the present and the past. To be conservative, forecasters must seek out and use all knowledge relevant to the problem, including knowledge of methods validated for the situation. Twenty-eight guidelines are logically deduced from the Golden Rule. A review of evidence identified 105 papers with experimental comparisons; 102 support the guidelines. Ignoring a single guideline increased forecast error by more than two-fifths on average. Ignoring the Golden Rule is likely to harm accuracy most when the situation is uncertain and complex, and when bias is likely. Non-experts who use the Golden Rule can identify dubious forecasts quickly and inexpensively. To date, ignorance of research findings, bias, sophisticated statistical procedures, and the proliferation of big data, have led forecasters to violate the Golden Rule. As a result, despite major advances in evidence-based forecasting methods, forecasting practice in many fields has failed to improve over the past half-century.
Article
Full-text available
How many fire companies does New York City need and where should they be located? Given a fire alarm of unknown severity, how many companies should be dispatched to it? These two questions are fundamental issues in the deployment of the City's fire-fighting resources. Since 1968, the New York City Fire Department and The New York City-Rand Institute have carried out a joint project to improve the delivery of Fire Department services in the face of skyrocketing demand. In November 1972, two historical deployment changes were implemented: (a) six of the 375 fire companies in the City were disbanded and seven other companies were permanently relocated; and (b) in high fire incidence areas of the City, an adaptive response policy was implemented. Under adaptive response, fewer companies are initially dispatched to potentially less serious alarms. This is in contrast to the traditional dispatching policy where the same number of companies are dispatched to each alarm. The joint Fire Department-Rand Institute project and the analyses which led to these and other improvements and the wide range of mathematical models used are described. The changes have resulted in savings to the Fire Department of over $5 million per year, a reduction in the workload of fire companies and a more equitable distribution of fire companies throughout the City.
Article
Full-text available
Arising from research in the computer science community, constraint programming is a fairly new technique for solving optimization problems. For those familiar with mathematical programming, a number of language barriers make it difficult to understand the concepts of constraint programming. In this short tutorial on constraint programming, we explain how it relates to familiar mathematical programming concepts and how constraint programming and mathematical programming technologies are complementary. We assume a minimal back- ground in linear and integer programming.
Article
Full-text available
Linear programming is a fundamental planning tool. It is often difficult to precisely estimate or forecast certain critical data elements of the linear program. In such cases, it is necessary to address the impact of uncertainty during the planning process. We discuss a variety of LP-based models that can be used for planning under uncertainty In all cases, we begin with a interministic LP model and show how it can be adapted to include the impact of uncertainty. We present models that range from simple recourse policies to more general two-stage and multistage SLP formulations. We also include a discussion of probabilistic constraints. We illustrate the various models using examples taken from the literature. The examples involve models developed for airline yield management, telecommunications, flood control, and production planning.
Article
Full-text available
Soyer and Hogarth’s article, 'The Illusion of Predictability,' shows that diagnostic statistics that are commonly provided with regression analysis lead to confusion, reduced accuracy, and overconfidence. Even highly competent researchers are subject to these problems. This overview examines the Soyer-Hogarth findings in light of prior research on illusions associated with regression analysis. It also summarizes solutions that have been proposed over the past century. These solutions would enhance the value of regression analysis.
Article
Full-text available
This paper examines the feasibility of rule-based forecasting, a procedure that applies forecasting expertise and domain knowledge to produce forecasts according to features of the data. We developed a rule base to make annual extrapolation forecasts for economic and demographic time series. The development of the rule base drew upon protocol analyses of five experts on forecasting methods. This rule base, consisting of 99 rules, combined forecasts from four extrapolation methods (the random walk, regression, Brown's linear exponential smoothing, and Holt's exponential smoothing) according to rules using 18 features of time series. For one-year ahead ex ante forecasts of 90 annual series, the median absolute percentage error (MdAPE) for rule-based forecasting was 13% less than that from equally-weighted combined forecasts. For six-year ahead ex ante forecasts, rule-based forecasting had a MdAPE that was 42% less. The improvement in accuracy of the rule-based forecasts over equally-weighted combined forecasts was statistically significant. Rule-based forecasting was more accurate than equal-weights combining in situations involving significant trends, low uncertainty, stability, and good domain expertise.
Article
Full-text available
I briefly summarize prior research showing that tests of statistical significance are improperly used even in leading scholarly journals. Attempts to educate researchers to avoid pitfalls have had little success. Even when done properly, however, statistical significance tests are of no value. Other researchers have discussed reasons for these failures. I was unable to find empirical evidence to support the use of significance tests under any conditions. I then show that tests of statistical significance are harmful to the development of scientific knowledge because they distract the researcher from the use of proper methods. I illustrate the dangers of significance tests by examining a re-analysis of the M3-Competition. Although the authors of the re-analysis conducted a proper series of statistical tests, they suggested that the original M3-Competition was not justified in concluding that combined forecasts reduce errors, and that the selection of the best method is dependent on the selection of a proper error measure. I show that the original conclusions were correct. Authors should avoid tests of statistical significance; instead, they should report on effect sizes, confidence intervals, replications/extensions, and meta-analyses. Practitioners should ignore significance tests and journals should discourage them.
Article
Full-text available
Observed declines in the Arctic sea ice have resulted in a variety of negative effects on polar bears (Ursus maritimus). Projections for additional future declines in sea ice resulted in a proposal to list polar bears as a threatened species under the United States Endangered Species Act. To provide information for the Department of the Interior's listing-decision process, the US Geological Survey (USGS) produced a series of nine research reports evaluating the present and future status of polar bears throughout their range. In response, Armstrong et al. [Armstrong, J. S., K. C. Green, W. Soon. 2008. Polar bear population forecasts: A public-policy forecasting audit. Interfaces38(5) 382--405], which we will refer to as AGS, performed an audit of two of these nine reports. AGS claimed that the general circulation models upon which the USGS reports relied were not valid forecasting tools, that USGS researchers were not objective or lacked independence from policy decisions, that they did not utilize all available information in constructing their forecasts, and that they violated numerous principles of forecasting espoused by AGS. AGS (p. 382) concluded that the two USGS reports were “unscientific and inconsequential to decision makers.” We evaluate the AGS audit and show how AGS are mistaken or misleading on every claim. We provide evidence that general circulation models are useful in forecasting future climate conditions and that corporate and government leaders are relying on these models to do so. We clarify the strict independence of the USGS from the listing decision. We show that the allegations of failure to follow the principles of forecasting espoused by AGS are either incorrect or are based on misconceptions about the Arctic environment, polar bear biology, or statistical and mathematical methods. We conclude by showing that the AGS principles of forecasting are too ambiguous and subjective to be used as a reliable basis for auditing scientific investigations. In summary, we show that the AGS audit offers no valid criticism of the USGS conclusion that global warming poses a serious threat to the future welfare of polar bears and that it only serves to distract from reasoned public-policy debate.
Article
Full-text available
I obtained quality ratings and rankings of 39 journals in operations management and related disciplines through surveys of faculty members at top-25 US business schools in 2000 and in 2002. I also computed five-year impact factors for 29 of these journals and developed a ranking based on these impact factors. I found evidence of some change in journal quality ratings over the two-year period. Ratings also differed by research area but not by professorial level. In addition, I ranked the journals based on the number of academics who rated their quality, calling this a visibility measure. Finally, I compared my ratings to ratings in earlier survey and citation studies. The quality ratings were more consistent than the citation ratings.
Article
Full-text available
Policymakers need to know whether prediction is possible and, if so, whether any proposed forecasting method will provide forecasts that are substantially more accurate than those from the relevant benchmark method. An inspection of global temperature data suggests that temperature is subject to irregular variations on all relevant time scales, and that variations during the late 1900s were not unusual. In such a situation, a "no change" extrapolation is an appropriate benchmark forecasting method. We used the UK Met Office Hadley Centre's annual average thermometer data from 1850 through 2007 to examine the performance of the benchmark method. The accuracy of forecasts from the benchmark is such that even perfect forecasts would be unlikely to help policymakers. For example, mean absolute errors for the 20- and 50-year horizons were 0.18 � oC and 0.24 � oC respectively. We nevertheless demonstrate the use of benchmarking with the example of the Intergovernmental Panel on Climate Change's 1992 linear projection of long-term warming at a rate of 0.03 � oC per year. The small sample of errors from ex ante projections at 0.03 � oC per year for 1992 through 2008 was practically indistinguishable from the benchmark errors. Validation for long-term forecasting, however, requires a much longer horizon. Again using the IPCC warming rate for our demonstration, we projected the rate successively over a period analogous to that envisaged in their scenario of exponential CO2 growth--the years 1851 to 1975. The errors from the projections were more than seven times greater than the errors from the benchmark method. Relative errors were larger for longer forecast horizons. Our validation exercise illustrates the importance of determining whether it is possible to obtain forecasts that are more useful than those from a simple benchmark before making expensive policy decisions.
Article
This issue of the INFORMS Journal on Applied Analytics (formerly Interfaces) includes a special section of full-length papers from four of the finalists of the 2018 Innovative Applications in Analytics Award, which recognizes organizations for novel and impactful analytics applications. The finalists’ applications, four of which are described herein, cover a wide range of application areas and analytics techniques and point to the universal applicability and significant impact of analytics. The INFORMS website includes additional information on this award; see https://www.informs.org/Recognizing-Excellence/Community-Prizes/Analytics-Society/Innovative-Applications-in-Analytics-Award .
Article
The authors describe NeatWork, an optimization and simulation tool to facilitate the design of gravity-driven water distribution systems. Its objective is to deliver clean water to poor rural communities.
Article
The authors describe the development of a set of three supervised machine-learning models, which the New York City Police Department uses to help identify related crimes, including burglaries, robberies, and grand larcenies.
Article
Questions of trust in machine-learning models are becoming increasingly important as these tools are starting to be used widely for high-stakes decisions in medicine and criminal justice. Transparency of models is a key aspect affecting trust. This paper reveals that there is new technology to build transparent machine-learning models that are often as accurate as black-box machine-learning models. These methods have already had an impact in medicine and criminal justice. This work calls into question the overall need for black-box models in these applications.
Article
In 2016, the Heinz College of Information Systems and Public Policy at Carnegie Mellon University was awarded the INFORMS UPS George D. Smith Prize for excellence in preparing students to be effective practitioners of operations research (OR) and analytics. In this article, we describe the philosophy underlying the college's approach to analytics education and the specific ways in which students in both the information systems and public policy programs combine coursework and experiential learning to apply a multidisciplinary problem-solving toolkit on their way to becoming men and women for intelligent action.
Article
The Optimization for Networked Data in Environmental Urban Waste Collection (ONDE-UWC) project is, to the best of our knowledge, the first attempt to apply the Internet of Things (IoT) paradigm to the field of waste collection. Sensors installed in dumpsters and garbage trucks share data, such as weight measurements and the number of user accesses. In this study, we schedule the weekly waste-collection activities for multiple types of waste without imposing periodic routes. An important characteristic of this project is that we consider heterogeneous stakeholders with different backgrounds and knowledge. In this context, we apply the GUEST OR methodology, highlighting how it can support the decision-making process to reduce the effects of these differences. As a result, we reduced the time required to implement the solution, increased operational efficiency, and achieved cost savings.
Article
In 2014, the MIT Leaders for Global Operations Program (LGO) was awarded the third annual UPS George D. Smith Prize for effective and innovative preparation of students to be good practitioners of operations research. In this paper, we describe the innovative, interdisciplinary education of the two-year dual-degree LGO program, which trains students for careers in operations and manufacturing through a three-way partnership between the MIT School of Engineering, the MIT School of Management, and industry. The first-of-its-kind program includes a six-month research internship that results in a master's thesis. This paper will discuss the LGO approach to preparing students for practicing operations research and the impact that the program has made on operations in manufacturing and services companies.
Article
I am honored to follow a long line of successful EiCs who have made Interfaces the premier journal in the practice of OR/MS. Most recently, it has benefited from the stewardship of its first nonacademic, industrial researcher and practitioner, Srinivas Bollapragada of General Electric. Under the leadership of Srinivas and his excellent editorial board, Interfaces is stronger than ever. As I take over the role of EiC, I am glad to see that Interfaces has an excellent pipeline of papers. Srinivas has offered to continue to serve on the board as a contributing editor, and will assist me throughout 2017 as we manage the transition between EiCs with minimal changes to the Interfaces board. As EiC, I plan to continue to further the great efforts of my predecessors, with a focus on continuing to increase the reputation, readership, and impact of Interfaces. For example, I plan to continue Srinivas’ desire for more practitioner submissions (through the Practice Summary outlet), and to encourage more international (non-U.S.) submissions. I also plan to continue the great practice of offering one special issue in Interfaces each year. In recent years, Srinivas has had special issues in Mining, Freight Transportation, and Energy, among others. I am excited to note that Srinivas has a special issue planned on Big Data, under the direction of Deepak Turaga, slated for 2017. I am considering a special issue on Nontraditional Analytics for 2018; this issue could include unusual or novel applications, or non-OR approaches, that might lead to improved analytics-based solutions. It is important that I mention that I am open to proposals on other topics for a special issue; please feel free to provide your thoughts to me. Despite being widely subscribed to by INFORMS members and libraries, I find that Interfaces is still not well known in many circles where it could add value. I plan to work with the INFORMS staff to heavily promote Interfaces to both researchers and the general public to raise awareness of its viability as an outlet and the great researchers and projects highlighted in the articles. There are many practitioners doing analytical work who do not realize the value that Interfaces can bring to them by introducing novel approaches to problem solving. Similarly, there are many researchers doing very good applied work who are not considering Interfaces as an outlet for their work. I hope to close those gaps and get that valuable applied research into the hands of those who can benefit from it. Finally, I would like to continue to encourage increased international participation of authors. To that end, I hope to fill the position of Deputy Editor of Interfaces with someone with a passion for applied analytics who wants to help me promote the importance of Interfaces to researchers and practitioners. I am also open to nominations for Associate Editors from Europe, Africa, South America, and Asia to expand the international presence of Interfaces. Feel free to contact me with nominations, or if interested in the position.
Article
—“Using Pert to Manage High Technology Projects”; —“European Management Book Award”; —“A Management Science Team Develops a Production System”; —“Litter Research at Anheuser-Busch”; —“Ghetto Life Simulated at Oklahoma State”.
Article
Editorial about the first issue of the new journal.
Article
We conducted a survey of academically affiliated members of INFORMS to better understand the extent of the current usage of Interfaces. We asked respondents if and how they use Interfaces as a resource in teaching and research, and we also asked a series of questions about their careers and the institutions at which they are employed. Our results show that Interfaces is used mostly in teaching MBA core and elective courses and MS and PhD courses in operations research and operations management. Edelman papers are generally used mostly for teaching and reference in research. In this paper, we describe some differences based on type of academic institution and on whether the respondent is an Interfaces subscriber; we also identify opportunities for Interfaces.
Article
An auction designer faces many questions. Among the questions are what precisely is to be sold, who may bid, will there be multiple rounds of bidding, what kinds of payments will be made and which are biddable, how will the bids be compared, how will it be decided if a transaction occurs, how will the price be set, how will sales of multiple items be handled, and what information will be revealed and when. Different academic disciplines have contributed to answering these questions. In addition to OR/MS, these include engineering, economics, sociology, and computer science.
Article
Access to powerful new computers has encouraged routine use of highly complex analytic techniques, often in the absence of any theory, hypotheses, or model to guide the researcher's expectations of results. The author examines the potential of such techniques for generating spurious results, and urges that in exploratory work the outcome be subjected to a more rigorous criterion than the usual tests of statistical significance.
Article
Interfaces is going to start using double-blind refereeing. Sort of. My purpose in this editorial is to explain what this means, give some history and context, and mostly to explain why we are doing it.
Article
Contemplations about statements the author considers to be self-evident truths: The management sciences are applied sciences, gaining their importance from their usefulness in policy and decision making. The value of research in the management sciences depends upon its ultimate usefulness for decision and policy making. The value of practice in the management sciences depends upon the existence of a base of relevant knowledge. Thus, researchers and practitioners in the management sciences are of necessity dependent upon each other.
Article
Organizations and professions get what they measure and reward. If we want educational programs to contribute to OR/MS practice, we have to figure out how to measure such contributions. The INFORMS Academic/Practitioner Interface Committee has discussed the possibility of ranking or rating academic programs on their contributions to OR/MS practice. Many difficulties stand in its way—difficulties in implementation and, especially, in defining the rating scale. How should a scale measure such factors as training students for practice, contributing to the practice literature, contributing to theory useful for practice, and conducting cooperative projects with industry? How should it combine such measures? Such overall rankings would necessarily be highly subjective and difficult to implement well. As editor of a publication focused on practice, I am interested in a more limited goal. I want a scale that measures one aspect of a university's contribution to practice: its contribution to the practice literature. I want deans who promulgate goals for their programs to have available a measure of such contributions. To encourage and reward such contributions, I want recognition for the programs that make them. To that end, I developed an index—the Interfaces index of contributions to the practice literature. Like all measurement schemes, it has limitations. Even so, when I applied it to the literature from 1988 to 1994, I obtained some interesting results and a ranking of universities. I plan to update the index and the rankings periodically.
Article
Since the early days of MS/OR, the forest products industry has been a fertile area for research and applications. In the introduction to this special issue, we trace the major trends MS/OR theory and practice have taken over the past 30 years, assess the current state of the art, and identify promising new areas of research and practice.
Article
This paper describes the design, implementation and evaluation of a system for the management of pension funds by a large institutional investor. As a direct result of this management science based system, which had been in operation for over three years prior to this study, the jobs of security analysts, portfolio managers and the management of the investment function were significantly changed. The institution publicly acknowledged the success of this system in its annual reports and internally acknowledged the additional business revenues generated by this new approach to money management. Three major aspects of this research which spanned a period of over six years are discussed. Initially the behavioral science analysis which helped to determine why classic portfolio selection or equity valuation models had not been accepted by the organization is reviewed. This phase also contains an analysis of the necessary changes to information flows and job responsibilities before the organization could reasonably be expected to accept any normative portfolio management tools. The focus is then directed toward the key management science concepts incorporated in the Management Information System which was designed to remove the behavioral barriers to change. The major tools and concepts employed and presented in this phase are the use of the triangular distributions to collect conditional, subjective forecasts and to provide subjective distribution feedback, loss functions which are based on the impact of outcomes to the institution and not on squared error variability and the construction of feedback measures which are consistent with the organizational responsibilities of the individual. The third phase of this study is an evaluation of whether investment performance was actually improved by the new system. Testing this and related hypotheses is complicated by the fact that it was difficult enough to convince management to install one new money management system, let alone several systems simultaneously to permit controlled tests. Some aspects of this are discussed in the final section.
Article
In the 1960's a controversy arose regarding the safety of navigation standards of jet aircraft over the North Atlantic Ocean, which led to a confrontation between airline owners and pilots. A systems analysis led to a redesigned and improved system which resolved the controversy by giving each side at least as much as it originally requested, in terms of minimizing cost on the one hand and maximizing safety on the other.
Article
In 1972 Federal Reserve Banks were ordered to provide overnight clearing services for all intra-district checks. Each District's new system required specification of —number, location and capacity of processing centers, and —transportation network including time tables. Check flow was simulated by linking sub-models of —transportation —check volume by individual commercial bank —commercial bank behavior as to choice among Fed, correspondent bank, or local group clearing —facility sizing to handle anticipated workload. Performance measures include total costs, service provided, and checks late. It was concluded that an expansion of the existing Philadelphia facility was feasible and optimal. The new system began operation in February 1973. A more important study benefit was the demonstration to the Fed of the advantages of taking a comprehensive and systematic view of their decision-making. Five in-house operations research groups have since been formed within the Federal Reserve System.
Article
The now famous “Peter Principle” asserts that every manager has a unique executive ability potential and that this ability ceiling or upper limit leads to an inevitable growth in the gap between an organization's demands and the ability of any executive to meet them. In fact, this assertion is shown to be misleading after a formal analysis of two deceptively simple questions. First, how does executive ability develop over time as a manager ascends the leadership ladder; and second, how does this ability compare with an organization's demand for ability at any given time? The result of this analysis is a demonstration that executive ability does increase with the growth in organizational demands, but the increase is cyclical rather than uniform. Additional results are noted concerning the value of executive development programs and the relationship of personal or self problems to organizational problems.
Article
Investments totaling billions of dollars annually are now required of the nation's energy companies in order to keep up with the nation's needs. Getty Oil Company itself is now investing 200to200 to 300 million annually. This requires many significant investment decisions each year, and the fortunes of the corporation are tied to the wisdom of the choices. The incentive to excel is a strong one—avoiding even a few bad investments can save millions of dollars. This incentive has fostered a dynamic evolution in our financial analysis. Decisions once necessarily based on limited information and short-cut discounted cash flow calculations are now supported by much more thorough and rapid analysis. Financial analysts who seldom never used computers are now directly and effectively exploiting this resource to extend their capabilities. A key factor in this progress is PAMS, the Plan Analysis and Modeling System. PAMS is designed for a more complete approach to financial investigation than most other systems. It is a computer system that does all phases of analysis work. It has features for: extensive data management, complete yet flexible calculations, and tabular and graphic management reports. The PAMS approach is used today for hundreds of applications in Getty, from project analysis to a corporate model, from price forecasting to acquisition and divestment analysis. PAMS has had a major impact by helping to achieve: Better investment decisions, Improved long-range planning, New more effective approaches to financial analysis, Company-wide introduction of new concepts such as risk analysis.
Article
The paper describes an investment analysis for a satellite communications system for the United States. The satellite communications business is a complex one and possesses several key variables which have profound implications on the future profitability of a participant. Uncertainty in these key variables adds yet another dimension of complexity to the characteristics of this business. The model used in the investment analysis is a probabilistic simulation model which attempts to capture the interactions and relationships existing among all the variables. The size and mix of the potential market, competition from terrestrial as well as satellite carriers, communication satellite-launch vehicle combinations, and specific financial arrangements were some of the key variables that were explicitly recognized and accounted for in the model. The paper describes the course of an actual study from preliminary evaluation to final management decision. It develops the model and management understanding of the business over time. It shows the impact of the management decisions on the model and the impact of model results on management perceptions and decisions. It demonstrates how management and the management scientist, working together, can speed up the decision process as well as lead to a clarification of the underlying issues of the business.
Article
A policy setting decision in 1967 on the pricing of Canadian natural gas exports to the Pacific Northwest offers a good case history about successful integration of management science in national policy formulation and decision making. The opportunity cost of exported gas was determined using dynamic programming and showed that the minimum cost alternative to supply the U.S. Pacific Northwest was markedly higher than the “in line price” of Canadian gas exports imposed by the U.S. Federal Power Commission. System simulations confirmed this finding and led to rejection of gas exports on FPC's terms and substitution of opportunity cost pricing to “in line” pricing. The abandoned policy decreed that U.S. imports of Canadian gas be priced no higher than the price at the point of export in Canada, in this case 31.6¢/Mcf in the Vancouver region. The gas exporting company accepted the FPC ruling because it had no other choice, having already invested about fifty million dollars in facilities, in anticipation of this export. Canada's National Energy Board objected to “in line pricing”, particularly since the same gas exporter was already selling a far larger amount of gas to the Pacific Northwest at 23.3¢/Mcf which was far below the 31.6¢/Mcf price in Vancouver. Yet NEB was under extreme pressure to grant the application considering the disastrous effect of a denial of the export on the applicant, which was one of its regulated public utilities. The management science contribution was to show through optimal pipeline expansion studies that the minimum cost of alternate U.S. sources of gas supply was far above 31.6¢/Mcf, and, hence, that rejection of the application would force abandonment of FPC's policy and lift the ceiling on gas export prices. The application was rejected, but the FPC backed down and allowed this gas and subsequent imports at higher prices.
Article
The prevalence of faulty citations impedes the growth of scientific knowledge. Faulty citations include omissions of relevant papers, incorrect references, and quotation errors that misreport findings. We discuss key studies in these areas. We then examine citations to “Estimating nonresponse bias in mail surveys,” one of the most frequently cited papers from the Journal of Marketing Research, to illustrate these issues. This paper is especially useful in testing for quotation errors because it provides specific operational recommendations on adjusting for nonresponse bias; therefore, it allows us to determine whether the citing papers properly used the findings. By any number of measures, those doing survey research fail to cite this paper and, presumably, make inadequate adjustments for nonresponse bias. Furthermore, even when the paper was cited, 49 of the 50 studies that we examined reported its findings improperly. The inappropriate use of statistical-significance testing led researchers to conclude that nonresponse bias was not present in 76 percent of the studies in our sample. Only one of the studies in the sample made any adjustment for it. Judging from the original paper, we estimate that the study researchers should have predicted nonresponse bias and adjusted for 148 variables. In this case, the faulty citations seem to have arisen either because the authors did not read the original paper or because they did not fully understand its implications. To address the problem of omissions, we recommend that journals include a section on their websites to list all relevant papers that have been overlooked and show how the omitted paper relates to the published paper. In general, authors should routinely verify the accuracy of their sources by reading the cited papers. For substantive findings, they should attempt to contact the authors for confirmation or clarification of the results and methods. This would also provide them with the opportunity to enquire about other relevant references. Journal editors should require that authors sign statements.
Article
In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of improvement that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.To illustrate our point, we have undertaken the most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets. Our empirical results strongly support our assertion, and suggest the need for a set of time series benchmarks and more careful empirical evaluation in the data mining community.
Article
This paper describes the M3-Competition, the latest of the M-Competitions. It explains the reasons for conducting the competition and summarizes its results and conclusions. In addition, the paper compares such results/conclusions with those of the previous two M-Competitions as well as with those of other major empirical studies. Finally, the implications of these results and conclusions are considered, their consequences for both the theory and practice of forecasting are explored and directions for future research are contemplated.
Article
The main conclusions of the M3 competition were derived from the analyses of descriptive statistics with no formal statistical testing. One of the commentaries noted that the results had not been tested for statistical significance. This paper undertakes such an analysis by examining the primary findings of that competition. We introduce a new methodology that has not previously been used to evaluate economic forecasts: multiple comparisons. We use this technique to compare each method against the best and against the mean. We conclude that the accuracy of the various methods does differ significantly, and that some methods are significantly better than others. We confirm that there is no relationship between complexity and accuracy but also show that there is a significant relationship among the various measures of accuracy. Finally, we find that the M3 conclusion that a combination of methods is better than that of the methods being combined was not proven.
Article
Railroad companies face a difficult problem in assigning empty freight cars based on customer demand because these assignments depend on a variety of factors; these include the location of available empty cars, the urgency of the demand, and the possibilities ...