Fig 1 - uploaded by Arash Khoda Bakhshi
Content may be subject to copyright.
Nested Factors versus Crossed Factors

Nested Factors versus Crossed Factors

Source publication
Article
Traditional hierarchical modeling has been proposed to account for unobserved heterogeneity in the crash analysis. Previous studies investigated the grouping of individual observations between different clusters by considering a single random factor at level-2 of data structure. This approach, however, hinders exploring the possible crossed effects...

Contexts in source publication

Context 1
... a data hierarchy, individual observations, at level-1, might be clustered between different groups of factor(s) at higher levels (O'Dwyer and Parker, 2014). The terminology of crossed-random intercept in CCREM is presented for the case of crossed random factors as opposed to nested random factors (Quené and Van den Bergh, 2008). According to Fig. 1, suppose there are two factors (V and U) within which the individuals are clustered. From Fig. 1A, nested factors can be encoded when any group of the first factor (V) at level-2, appears only in one particular group of the second factor (U) at level-3. Thus, V itself is nested within U. In this situation, the number of levels in ...
Context 2
... to Fig. 1, suppose there are two factors (V and U) within which the individuals are clustered. From Fig. 1A, nested factors can be encoded when any group of the first factor (V) at level-2, appears only in one particular group of the second factor (U) at level-3. Thus, V itself is nested within U. In this situation, the number of levels in hierarchical modeling should be increased (e.g., from two-level modeling to three-level modeling in this example) ( Baayen et al., 2008;Cho and Rabe-Hesketh, 2011;Quené and Van den Bergh, 2008). ...
Context 3
... the other hand, Fig. 1C presents crossed factors where any given group of V can arise in more than one group of U (Quené and Van den Bergh, 2008). Equivalently, individual observations are not only clustered between different groups of U and V but also are grouped between different combinations of U and V. Using crossed factors at level-2 in two-level ...
Context 4
... V but also are grouped between different combinations of U and V. Using crossed factors at level-2 in two-level modeling under the concept of CCREM can account for this type of grouping ( Baayen et al., 2008;Cho and Rabe-Hesketh, 2011;Fielding and Goldstein, 2006;Garson, 2013;Goldstein, 2011;Quené and Van den Bergh, 2008;Shi et al., 2010). Also, Fig. 1B and D indicate the cross-classification of individual observations for nested factors and crossed factors, respectively. The white cells show there is no observation, and the number of hypothetical observations in black cells is more than the gray cells. Hence, in the case of nested factors, the cross-classification of observations ...

Similar publications

Article
Full-text available
Background Studies with integrative approaches (based on different lines of evidence) are fundamental for understanding the diversity of organisms. Different data sources can improve the understanding of the taxonomy and evolution of snakes. We used this integrative approach to verify the taxonomic status of Hydrodynastes gigas (Duméril, Bibron & D...

Citations

... However, these models assume the effects of parameters are fixed over space and time and ignore the potentially unobserved heterogeneity. While the generalized linear mixed model and Bayesian models with covariance components captures the unobserved heterogeneity in the error terms or latent variables (Bakhshi & Ahmed, 2021;Wu, Song, & Meng, 2021;Ling, Murray-Tuite, Lee, Ge, & Ukkusuri, 2021;Mannering & Bhat, 2014;Jin, Chowdhury, Khan, & Gerard, 2021), they do not capture the potential variation of parameters across different locations. Then, the geographically weighted regression (GWR) model (Brunsdon, Fotheringham, & Charlton, 1998;Ling, Qian, Guo, & Ukkusuri, 2022) was developed to explore spatial nonstationarity using physical distance to calibrate a multiple regression model. ...
... As part of a project sponsored by FDOT, this paper addresses practitioners' concerns and challenges with respect to the implementation of an innovative contextbased classification methodology into the current version of the SPFs recommended in the HSM for Florida roadway segments. Previous studies have demonstrated that utilizing more advanced techniques, such as random parameter zero-inflated negative binomial models and multivariate Poisson-lognormal models, to tackle unobserved heterogeneity can enhance the accuracy of crash prediction models (55)(56)(57). In view of this, a worthwhile direction for future research would be to examine the outperformance of context-based modeling in accordance with a more comprehensive zero-inflated crash prediction model rather than the current HSM-SPFs. ...
Article
Negative binomial-based safety performance functions (SPFs) have been extensively used by United States Department of Transportation professionals for predictive crash analysis. Recently, the Florida Department of Transportation (FDOT) has developed a context classification approach and incorporated it into crash prediction models, which has the potential to significantly enhance their accuracy and reliability. The additional modeling contexts and parameters make it more challenging to diagnose and remedy modeling problems, however. Particularly for roadway segments with low annual average daily traffic (AADT), short lengths, or low counts of severe crashes, the SPF models significantly underestimate the actual number of crashes. This uncertainty in SPF predictions can lead FDOT practitioners to reach misleading conclusions, such as failing to detect sites with genuinely high crash rates. This project intends to establish thresholds for certain SPF parameters to ensure reliable crash predictions are obtained across various context classes. For this purpose, we (a) developed a functional statistical model that quantifies economic loss relative prediction errors as a function of AADT volume and (b) calculated the minimum context-specific AADT threshold for each segment length group, roadway category, context classification, and crash severity combination. Employing the developed AADT thresholds confirmed up to 89% reduction in SPF prediction errors for the most represented context class. In light of the results obtained, we are able to conclude that context-specific AADT thresholds perform well in significantly reducing prediction errors for the thresholded segments and contexts on Florida roadways.
... A mixed logit (random parameter logit) model is developed for each, using South Carolina statewide work zone crash data from 2014 to 2020. Mixed logit models have been used more frequently in recent years due to their ability to account for unobserved heterogeneity [13][14][15][16][17][18][19][20][21][22]. In addition, mixed models require less detailed crash-specific data compared to fixed-parameter logit models [23]. ...
Article
Full-text available
This study investigates factors contributing to the injury severity of truck-involved work zones crashes in South Carolina (SC). The outcome of interest is injury or property damage only crashes, and the explanatory factors examined include the occupant, vehicle, collision, roadway, temporal, and environmental characteristics. Two mixed (random parameter) logit models are developed, one for non-interstates with speed limits less than 60 miles per hour (mph) and one for interstates with speed limits greater than or equal to 60 mph, using South Carolina statewide truck-involved work zone crash data from 2014 to 2020. Results of log-likelihood ratio tests indicate that separate speed models are warranted. The factors that were found to contribute to injury at the 90% confidence level in both models (interstate and non-interstate) are (1) dark lighting conditions, (2) female (at-fault) drivers, and (3) driving too fast for roadway conditions. Significant factors that apply only to non-interstates are SC or US primary roadways, activity area of the work zone, at-fault drivers under 35, sideswipe collision, presence of workers in the work zone, and collision with fixed objects. Significant factors that apply only to interstates are three or more vehicles, rear-end collision, location before the first work zone sign, and weekdays.
... WAIC is a Bayesian approach that utilizes the posterior distribution for the estimation of out-of-sample expectation (Li et al., 2021;Bakhshi and Ahmed, 2021). In this study, WAIC would be estimated using the following (Watanabe and Opper, 2010); ...
Article
Hong Kong is a compact city with high activity and travel intensity. In the past decades, many footbridges and underpasses were installed to reduce the pedestrian-vehicle conflicts on urban roads. However, it is rare that the effects of configuration of pedestrian network on pedestrian crashes are investigated. In Hong Kong, many footbridges and underpasses are connected to major transport hubs and commercial building development and become parts of giant elevated and underground walkway systems. It is challenging to characterize such a complicated pedestrian network. In this study, a three-dimensional digital map is applied to estimate the connectivity and accessibility of pedestrian network, and measure the relationship between pedestrian network characteristics and pedestrian safety at the macroscopic level. Hence, the effects of footbridge and underpass on pedestrian safety are examined. For example, comprehensive built environment, pedestrian network, traffic, and crash data are aggregated to 379 grids (0.5 km × 0.5 km). Then, multivariate Poisson lognormal regression approach is applied to model fatal and severe injury (FSI) and slight injury pedestrian crashes, with which the effects of unobserved heterogeneity, spatial correlation, and correlation between crash counts are accounted. Results indicate that population density, traffic volume, walking trip, footpath density, node density, number of vertices per footpath segment, bus stop, metro exit, residential area, commercial area, and government and utility area are positively associated with pedestrian crashes. In contrast, average gradient, accessibility of footbridge, accessibility of underpass, and number of crossings per road segment are negatively associated with pedestrian crashes. In other word, pedestrian safety would be improved when footbridge and underpass are more accessible. Findings have implications for the design and planning of pedestrian network to promote walkability and improve pedestrian safety.
... The findings of studies show that crash frequency increases with the longitudinal grade (Ma et al. 2017;Ahmed et al. 2011;Fu et al. 2011). Khoda Bakhshi and Ahmed (2021a) showed that the risk of the occurrence of critical crashes is the highest in downgrades with sharp grades. Regarding the relationship between horizontal curvature and crash frequency, the results of the studies are different. ...
Article
Full-text available
A significant portion of rural crashes in Iran occur in the marginal areas around cities. Exclusive crash frequency models should be developed to identify the factors affecting the occurrence of crashes in these areas. In this study, the rural crash data of highways leading to the city of Isfahan were employed to develop a negative binomial (NB) model as a constant-parameter model, along with random effects negative binomial (RENB) and random parameters negative binomial (RPNB) models as the models based on the random parameter approach. The modeling results indicated the superiority of the RPNB model over the two other models. According to the findings of this study, the traffic volume of roads and the proportion of local and non-local vehicles, the proportion of longitudinal distance-limit violation, the way of controlling access points, and the access to rest areas affected the crash frequency in the marginal areas around cities.
... Also, this study demonstrated the concept of applying driving volatility measures as effective measures of driving instability at intersections. Similar frameworks can be developed to calculate these measures in real-time (Khoda Bakhshi and Ahmed, 2021) for proactive monitoring of intersection networks and detect intersections where there is a high probability of a crash. Future research can include volatility measures in the development of SPFs for different crash types, such as single-vehicle crashes, multiple-vehicle crashes, and pedestrian crashes. ...
Article
About 40 percent of motor vehicle crashes in the US are related to intersections. To deal with such crashes, Safety Performance Functions (SPFs) are vital elements of the predictive methods used in the Highway Safety Manual. The predictions of crash frequencies and potential reductions due to countermeasures are based on exposure and geometric variables. However, the role of driving behavior factors, e.g., hard accelerations and declarations at intersections, which can lead to crashes, are not explicitly treated in SPFs. One way to capture driving behavior is to harness connected vehicle data and quantify performance at intersections in terms of driving volatility measures, i.e., rapid changes in speed and acceleration. According to recent studies, driving volatility is typically associated with higher risk and safety-critical events and can serve as a surrogate for driving behavior. This study incorporates driving volatility measures in the development of SPFs for four-leg signalized intersections. The Safety Pilot Model Deployment (SPMD) data containing over 125 million Basic Safety Messages generated by over 2,800 connected vehicles are harnessed and linked with the crash, traffic, and geometric data belonging to 102 signalized intersections in Ann Arbor, Michigan. The results show that including driving volatility measures in SPFs can reduce model bias and significantly enhances the models' goodness-of-fit and predictive performance. Technically, the best results were obtained by applying Bayesian hierarchical Negative Binomial Models, which account for spatial correlation between signalized intersections. The results of this study have implications for practitioners and transportation agencies about incorporating driving behavior factors in the development of SPFs for greater accuracy and measures that can potentially reduce volatile driving.
... To analyze the influencing factors of crash severity, previous studies have used many methods, such as binomial or multinomial logistic regression [20], ordered logit or probit models [21], classification trees [22], linear genetic programming [16], etc. Due to the traffic data collection and clustering process, multilevel data structures widely exist. The traditional crash severity model cannot take multilevel data into account, disregarding the possible within-group correlation, which may lead to unreliable parameter estimation and statistical inference of the model [23]. The hierarchical Bayesian approach can address the problems caused by within-group correlations [24]. ...
... The hierarchical Bayesian approach can address the problems caused by within-group correlations [24]. In addition, the Bayesian inference can update the model using any engineering experiences or justified previous findings as prior knowledge [23,25]. This method can effectively handle missing data, which commonly appear in crash records, by considering the information present in other observed data [26,27]. ...
Article
Full-text available
With the growth of traffic demand, the number of newly built and renovated super multi-lane freeways (i.e., equal to or more than a two-way ten-lane) is increasing. Compared with traditional multi-lane freeways (i.e., a two-way six-lane or eight-lane), super multi-lane freeways have higher design speeds and more vehicle interweaving movements, which may lead to higher traffic risks. However, current studies mostly focus on the factors that affect crash severity on traditional multi-lane freeways, while little attention is paid to those on super multi-lane freeways. Therefore, this study aims to explore the impacting factors of crash severity on two kinds of freeways and make a comparison with traditional multi-lane freeways. The crash data of the Guangzhou-Shenzhen freeway in China from 2016 to 2019 is used in the study. This freeway contains both super multi-lane and traditional multi-lane road sections, and data on 2455 crashes on two-way ten-lane sections and 13,367 crashes on two-way six-lane sections were obtained for further analysis. Considering the effects of unobserved spatial heterogeneity, a hierarchical Bayesian approach is applied. The results show significant differences that influence the factors of serious crashes between these two kinds of freeways. On both two types of freeways, heavy-vehicle, two-vehicle, and multi-vehicle involvements are more likely to lead to serious crashes. Still, their impact on super multi-lane freeways is much stronger. In addition, for super multi-lane freeways, vehicle-to-facility collisions and rainy weather can result in a high possibility of serious crashes, but their impact on traditional multi-lane freeways are not significant. This study will contribute to understanding the impacting factors of crash severity on super multi-lane freeways and help the future design and safety management of super multi-lane freeways.
... The hierarchical Bayesian approach can solve such problems. In addition, since AV crash data are difficult to collect but gradually available, the hierarchical Bayesian approach can use any engineering experiences or justified previous findings as prior knowledge to update the model [30,31]. This approach could also well handle missing data that occur commonly in crash records by considering the information contained in other observed data [32,33]. ...
Article
Full-text available
Influencing factors on crash severity involved with autonomous vehicles (AVs) have been paid increasing attention. However, there is a lack of comparative analyses of those factors between AVs and human-driven vehicles. To fill this research gap, the study aims to explore the divergent effects of factors on crash severity under autonomous and conventional (i.e., human-driven) driving modes. This study obtained 180 publicly available autonomous vehicle crash data, and 39 explanatory variables were extracted from three categories, including environment, roads, and vehicles. Then, a hierarchical Bayesian approach was applied to analyze the impacting factors on crash severity (i.e., injury or no injury) under both driving modes with considering unobserved heterogeneities. The results showed that some influencing factors affected both driving modes, but their degrees were different. For example, daily visitors’ flowrate had a greater impact on the crash severity under the conventional driving mode. More influencing factors only had significant impacts on one of the driving modes. For example, in the autonomous driving mode, mixed land use increased the severity of crashes, while daytime had the opposite effects. This study could contribute to specifying more appropriate policies to reduce the crash severity of both autonomous and human-driven vehicles especially in mixed traffic conditions.
... This rural highway mostly operates at level of service A or B, accommodating 30% to 55% heavy truck traffic volume (11). 1 Department of Civil and Architectural Engineering, University of Wyoming, Laramie, WY Although the total traffic volume on I-80 has increased by 65% during the last three decades, heavy truck traffic volume has increased by 150% (11). This traffic composition, coupled with adverse weather conditions, mountainous topography, and challenging roadway geometry, caused I-80 to the first rank in the heavy vehicle crashes in the U.S. in 2014 by 0.52 million vehicles miles traveled (12)(13)(14). ...
... The results revealed that drivers' compliance is much apparent in their longitudinal and speed adaptation behavior, which affirms the conclusion drawn by Zhao et al. and Yang et al. about the significant and smooth speed reduction because of CV advisory messages (18,39). Besides, these differences in central tendencies coincided with reductions of these K-SMoS for CVs compared with non-CVs, promoting speed harmonization and traffic safety level (7,14,(63)(64)(65). Although statistical indications of safety enhancement have been observed in Section-2, this improvement is more evident in Section-3. ...
... In relation to the longitudinal speed adaptation, the central tendencies of the SSD of speed and coefficient of variation in speed have been reduced by almost 55% and 48% in the CV scenario, respectively. According to the literature, these two variables are among the most influential real-time crash-contributing factors, directly increasing the risk of crashes (7,14). Besides, SSD of speed is associated with spatial dispersion of longitudinal driving behavior. ...
Article
Foggy weather increases crash likelihood when coinciding with roadway geometry changes inconsistent with drivers’ expectations. The situation might be exacerbated for heavy trucks having to evade critical safety events because of the vehicles’ maneuverability limitations, imposing prime safety challenges on major freight corridors like Interstate-80 (I-80) in the U.S. Aligned with the connected vehicle (CV) pilot program on I-80 in Wyoming, this study intends to unveil how CV technology alleviates safety concerns in this regard. To this aim, a with/without analysis approach was performed utilizing a high-fidelity truck driving simulator. Twenty-three professional truck drivers were recruited to drive the simulator in CV scenario with traveler information messages, including foggy weather ahead and an advisory speed of 45 mph, and in a non-CV counterpart without notifications. Longitudinal and lateral drivers’ behaviors were quantified by kinematic-based surrogate measures of safety (K-SMoS) characterized on vehicles’ trajectory, including longitudinal speed, lateral speed, steering, their corresponding spatial standard deviations, and the coefficient of variation of longitudinal speed. The central tendency and dispersion of K-SMoS distributions were compared between CVs and non-CVs throughout the simulated roadway. Results showed immediate truck drivers’ compliance to CV notifications, which was more apparent in their longitudinal driving behaviors. On a horizontal curve with poor visibility, statistically significant reductions in central tendency and dispersion of K-SMoS distributions up to 67% in CVs were observed, minimizing the crash risk in CV environments. Besides, findings revealed that exposure to the CV notifications minimized drivers’ behavior uncertainty, manifesting in their improved situational awareness and enhancing the safety performance of the traffic stream.
... This paper intends to answer the questions above with the main focus on the safety performance assessment of a major freight corridor, I-80 in Wyoming, by extending the previous study by Khoda Bakhshi and Ahmed (2021f), where the concept of Cross-Classified Random Effects Modeling (CCREM) was introduced to the safety domain. To this aim, the current study particularly analyzes the crash dataset associated with a 402-mile I-80 corridor in Wyoming. ...
Article
Full-text available
Traffic crashes impose tremendous socio-economic losses on societies. To alleviate these concerns, countless traffic safety researches have shed light on the cognition of observable crash/crash severity contributing factors. Nonetheless, some influential factors might not be observable or measurable, referred to as unobserved heterogeneity, that could be accounted for by structuring random intercepts and slopes in hierarchical models. With this respect, although it is known random slopes can capture more unobserved heterogeneity, most previous studies utilized random intercepts to simplify result interpretations, indicating an inconsistency in the literature considering the hierarchical modeling specification. This study delves into the mentioned confusion within an empirical real-time clustering critical crashes, involving fatal or incapacitating injuries, versus non-critical crashes throughout 402-miles of Interstate-80 in Wyoming. The crash dataset was conflated with real-time traffic-related and environmental contributing factors. Regarding the inclusion of random intercepts and slopes, eleven Logistic regressions were conducted. As a data-dependent matter, results depicted random slopes, compared to random intercepts, do not necessarily enhance models’ out-of-sample predictive performance because they impose much more complexity on the models’ structure. Besides, considering the type of unobserved heterogeneity, if random slopes are required, random intercepts should be accompanied to allow data showing their true patterns.