Preface
Introduction
Transportation is integral to developed societies. It is responsible for personal mobility which includes access to services, goods, and leisure. It is also a key element in the delivery of consumer goods. Regional, state, national, and the world economy rely upon the efficient and safe functioning of transportation facilities.
In addition to the sweeping influence transportation has on economic and social aspects of modern society, transportation issues pose challenges to professionals across a wide range of disciplines including transportation engineers, urban and regional planners, economists, logisticians, systems and safety engineers, social scientists, law enforcement and security professionals, and consumer theorists. Where to place and expand transportation infrastructure, how to safely and efficiently operate and maintain infrastructure, and how to spend valuable resources to improve mobility, access to goods, services and healthcare, are among the decisions made routinely by transportation-related professionals.
Many transportation-related problems and challenges involve stochastic processes that are influenced by observed and unobserved factors in unknown ways. The stochastic nature of these problems is largely a result of the role that people play in transportation. Transportation-system users are routinely faced with decisions in contexts such as what transportation mode to use, which vehicle to purchase, whether or not to participate in a vanpool or telecommute, where to relocate a business, whether or not to support a proposed light-rail project and whether to utilize traveler information before or during a trip. These decisions involve various degrees of uncertainty. Transportation-system managers and governmental agencies face similar stochastic problems in determining how to measure and compare system measures of performance, where to invest in safety improvements, how to efficiently operate transportation systems and how to estimate transportation demand. As a result of the complexity, diversity, and stochastic nature of transportation problems, the methodological toolbox required of the transportation analyst must be broad.
Approach
The third edition of Statistical and Econometric Methods offers an expansion over the first and second editions in response to the recent methodological advancements in the fields of econometrics and statistics, to address reader and reviewer comments on the first and second editions, and to provide an increasing range of examples and corresponding data sets.
This book describes and illustrates some of the statistical and econometric tools commonly used in transportation data analysis. Every book must strike an appropriate balance between depth and breadth of theory and applications, given the intended audience. This book targets two general audiences. First, it can serve as a textbook for advanced undergraduate, Masters, and Ph.D. students in transportation-related disciplines including engineering, economics, urban and regional planning, and sociology. There is sufficient material to cover two 3-unit semester courses in statistical and econometric methods. Alternatively, a one semester course could consist of a subset of topics covered in this book. The publisher’s web-site contains the numerous datasets used to develop the examples in this book so that readers can use them to reinforce the modeling techniques discussed throughout the text. The book also serves as a technical reference for researchers and practitioners wishing to examine and understand a broad range of statistical and econometric tools required to study transportation problems. It provides a wide breadth of examples and case studies, covering applications in various aspects of transportation planning, engineering, safety, and economics. Sufficient analytical rigor is provided in each chapter so that fundamental concepts and principles are clear and numerous references are provided for those seeking additional technical details and applications.
Data-Driven Methods vs. Statistical and Econometric Methods
In the analysis of transportation data, four general methodological approaches have become widely applied: data-driven methods, traditional statistical methods, heterogeneity models, and causal inference models (the latter three of which fall into the category of statistical and econometric methods and are covered in this text). Each of these methods have an implicit trade-off between practical prediction accuracy and their ability to uncover underlying causality. Data-driven methods include a wide range of techniques including those relating to data mining, artificial intelligence, machine learning, neural networks, support vector machines, and others. Such methods have the potential to handle extremely large amounts of data and provide a high level of prediction accuracy. On the down side, such methods may not necessarily provide insights into underlying causality (truly understanding the effects of specific factors on accident likelihoods and their resulting injury probabilities). Traditional statistical methods provide reasonable predictive capability and some insight into causality, but they are eclipsed in both prediction and providing causal insights by other approaches Heterogeneity models extend traditional statistical and econometric methods to account for potential unobserved heterogeneity (unobserved factors that may be influencing the process of interest). Causal-inference models use statistical and econometric methods to focus on underlying causality, often sacrificing predictive capability to do so.
Even though data-driven methods are often a viable alternative to the analysis of transportation data if one is interested solely in prediction and not interested in uncovering causal effects, because the focus of this book is uncovering issues of causality using statistical and econometric methods, data-driven methods are not covered.
Chapter topics and organization
Part I of the book provides statistical fundamentals (Chapters 1 and 2). This portion of the book is useful for refreshing fundamentals and sufficiently preparing students for the following sections. This portion of the book is targeted for students who have taken a basic statistics course but have since forgotten many of the fundamentals and need a review.
Part II of the book presents continuous dependent variable models. The chapter on linear regression (Chapter 3) devotes additional pages to introduce common modeling practice—examining residuals, creating indicator variables, and building statistical models—and thus serves as a logical starting chapter for readers new to statistical modeling. The subsection on Tobit and censored regressions is new to the second edition. Chapter 4 discusses the impacts of failing to meet linear regression assumptions and presents corresponding solutions. Chapter 5 deals with simultaneous equation models and presents modeling methods appropriate when studying two or more interrelated dependent variables. Chapter 6 presents methods for analyzing panel data—data obtained from repeated observations on sampling units over time, such as household surveys conducted several times to a sample of households. When data are collected continuously over time, such as hourly, daily, weekly, or yearly, time series methods and models are often needed and are discussed in Chapters 7 and 8. New to the 2nd edition is explicit treatment of frequency domain time series analysis including Fourier and Wavelets analysis methods. Latent variable models, discussed in Chapter 9, are used when the dependent variable is not directly observable and is approximated with one or more surrogate variables. The final chapter in this section, Chapter 10, presents duration models, which are used to model time-until-event data as survival, hazard, and decay processes.
Part III in the book presents count and discrete dependent variable models. Count models (Chapter 11) arise when the data of interest are non-negative integers. Examples of such data include vehicles in a queue and the number of vehicle crashes per unit time. Zero inflation—a phenomenon observed frequently with count data—is discussed in detail and a new example and corresponding data set have been added in this 2nd edition. Logistic Regression is commonly used to model probabilities of binary outcomes, is presented in Chapter 12, and is unique to the 2nd edition. Discrete outcome models are extremely useful in many study applications, and are described in detail in Chapter 13. A unique feature of the book is that discrete outcome models are first considered statistically, and then later related to economic theories of consumer choice. Ordered probability models (a new chapter for the second edition) are presented in Chapter 14. Discrete-continuous models are presented in Chapter 15 and demonstrate that interrelated discrete and continuous data need to be modeled as a system rather than individually, such as the choice of which vehicle to drive and how far it will be driven.
Finally, Part IV of the book contains massively expanded chapter on random parameters models (Chapter 16), a new chapter on latent class models (Chapter 17), a new chapter on bivariate and multivariate dependent variable models (Chapter 18) and an expanded chapter on Bayesian statistical modeling (Chapter 19). Models that deal with unobserved heterogeneity (random parameters models and latent class models) have become the standard statistical approach in many transportation sub-disciplines and Chapters 16 and 17 provide an important introduction to these methods. Bivariate and multivariate dependent variable models are encountered in many transportation data analyses. Although the inter-relation among dependent variables has often been ignored in transportation research, the methodologies presented in Chapter 18 show how such inter-dependencies can be accurately modeled. The chapter on Bayesian statistical models (Chapter 19) arises as a result of the increasing prevalence of Bayesian inference and Markov Chain Monte Carlo Methods (an analytically convenient method for estimating complex Bayes’ models). This chapter presents the basic theory of Bayesian models, of Markov Chain Monte Carlo methods of sampling, and presents two separate examples of Bayes’ models.
The appendices are complementary to the remainder of the book. Appendix A presents fundamental concepts in statistics which support analytical methods discussed. Appendix B provides tables of probability distributions used in the book, while Appendix C describes typical uses of data transformations common to many statistical methods.
While the book covers a wide variety of analytical tools for improving the quality of research, it does not attempt to teach all elements of the research process. Specifically, the development and selection of research hypotheses, alternative experimental design methodologies, the virtues and drawbacks of experimental versus observational studies, and issues involved with the collection of data are not discussed. These issues are critical elements in the conduct of research, and can drastically impact the overall results and quality of the research endeavor. It is considered a prerequisite that readers of this book are educated and informed on these critical research elements in order to appropriately apply the analytical tools presented herein.
Simon P. Washnington
Mathew G. Karlaftis
Fred L. Mannering
Panigiotis Ch. Anastasopoulos