ArticlePDF Available

Spatial analysis model for traffic accident-prone roads classification: A proposed framework

Authors:

Abstract and Figures

span id="docs-internal-guid-9754d3d8-7fff-a7d2-6605-1c8f8c5a707a"> The classification method in the spatial analysis modeling based on the multi-criteria parameter is currently widely used to manage geographic information systems (GIS) software engineering. The accuracy of the proposed model will play an essential role in the successful software development of GIS. This is related to the nature of GIS used for mapping through spatial analysis. This paper aims to propose a framework of spatial analysis using a hybrid estimation model-based on a combination of multi-criteria decision-making (MCDM) and artificial neural networks (ANNs) (MCDM-ANNs) classification. The proposed framework is based on the comparison of existing frameworks through the concept of a literature review. The model in the proposed framework will be used for future work on the traffic accident-prone road classification through testing with a private or public spatial dataset. Model validation testing on the proposed framework uses metaheuristic optimization techniques. Policymakers can use the results of the model on the proposed framework for initial planning developing GIS software engineering through spatial analysis models. </span
Content may be subject to copyright.
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 10, No. 2, June 2021, pp. 365~373
ISSN: 2252-8938, DOI: 10.11591/ijai.v10.i2.pp365-373 365
Journal homepage: http://ijai.iaescore.com
Spatial analysis model for traffic accident-prone roads
classification: a proposed framework
Anik Vega Vitianingsih1, Nanna Suryana2, Zahriah Othman3
1Department of Informatics, Universitas Dr. Soetomo, Surabaya, Indonesia
1,2,3Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
Article Info
ABSTRACT
Article history:
Received Feb 25, 2020
Revised Dec 10, 2020
Accepted Apr 2, 2021
The classification method in the spatial analysis modeling based on the
multi-criteria parameter is currently widely used to manage geographic
information systems (GIS) software engineering. The accuracy of the
proposed model will play an essential role in the successful software
development of GIS. This is related to the nature of GIS used for mapping
through spatial analysis. This paper aims to propose a framework of spatial
analysis using a hybrid estimation model-based on a combination of multi-
criteria decision-making (MCDM) and artificial neural networks (ANNs)
(MCDM-ANNs) classification. The proposed framework is based on the
comparison of existing frameworks through the concept of a literature
review. The model in the proposed framework will be used for future work
on the traffic accident-prone road classification through testing with a private
or public spatial dataset. Model validation testing on the proposed framework
uses metaheuristic optimization techniques. Policymakers can use the results
of the model on the proposed framework for initial planning developing GIS
software engineering through spatial analysis models.
Keywords:
GIS software engineering
Hybrid estimation model-based
MCDM-ANNs
Proposed framework
Spatial analysis model
Traffic accident-prone roads
This is an open access article under the CC BY-SA license.
Corresponding Author:
Anik Vega Vitianingsih
Department of Informatics
Universitas Dr. Soetomo
Jalan Semolowaru 84 Surabaya, 60118, Surabaya, Indonesia
Email: vega@unitomo.ac.id
1. INTRODUCTION
Model accuracy prediction in the development of frameworks on GIS software is the first step in
efforts to improve the quality of GIS software developed and is part of quality control and quality assurance
[1]. Quality control will determine the method of spatial analysis to test quality standards [1]. A spatial
analysis modeling is a process to build an artificial intelligence (AI) model that is combined with trials on
spatial datasets [2], gathering spatial knowledge through spatial datasets and providing knowledge of models
in the framework through AI methods from various sources. The purpose of the spatial analysis model is to
make a description of the GIS software that will be developed, conduct simulations to test spatial datasets
through models on the AI method used on the proposed framework that has already been described. Spatial
datasets in GIS relate to how primary and secondary data are obtained through the collection process, and
then how the data is processed through spatial analysis to be information in the decision support system [3].
Visualization of spatial data can be done with cloud-terminal integration GIS to provide convenience in the
process of spatial analysis on a large number of spatial datasets [4], aggregation-based spatial datasets
information retrieval system [5]. Spatial datasets as the key to the value of big data in spatial data mining
(SDM) that refers to the description of attribute data requirements, how the data is obtained, and what AI
ISSN: 2252-8938
Int J Artif Intell, Vol. 10, No. 2, June 2021: 365 373
366
method is used to perform spatial analysis of the data [6], [4]. Spatial datasets become the basic structure in
GIS for the process of spatial analysis algorithms, analyzing algorithm principles, or adapting existing
algorithms [7]. The classification model in machine learning is prevalent [8] to be used research in the field
of spatial analysis of GIS. However, there is no concrete statement regarding which classification algorithm
is best to use with certainty because the accuracy, precision, and recall (APR) tests in each study use different
sample data. It is also based on the field of study, which is always other on the object of research conducted.
Previous research proposed a framework using the CART model (classification and regression
trees), which reported a 10-fold increase in the best value for crash severity prediction [9]. However, the
CART model has a weakness in the number of training data samples because changes in training and testing
data samples affect the results of spatial analysis [10]. Spatial analysis model using data mining decision tree
(J48, ID3, and CART) and naïve bayes classifiers [11] States that the accuracy value of 96.30% on the J48
method is higher than ID3, CART, and naïve bayes, where the naïve bayes have better performance even
though the accuracy value is small. Different studies suggest that the accuracy of prediction of classification
models with the decision tree approach to reach 84.1% [12]. Also, indicate that the enhanced empirical
bayesian (EB) method is a spatial analysis approach that is preferred for prediction of the number of
accidents in road segments [13]. Maximizes the accuracy value of the model for Geo-spatial data using the
adaptive k-nearest neighbor (kNN) classifier, i.e., by dynamically selecting k for each instance, the value
being classified reaches a ROC AUC score of 0,9. The fuzzy deep-learning approach model is used to reduce
the uncertainty of data in the prediction of traffic flows that affect road traffic accident rates [14].
Convolutional long short-term memory (ConvLSTM) neural network model [15] states that the proposed
framework is sufficiently accurate and significant to improve accuracy in traffic accident prediction for
heterogeneous data. The road accident classification model using random forests and boosted trees works
equally well with an average value of 80% accuracy and a sensitivity value of 50% [16].
The discussion in this paper emphasizes the comparison in modeling spatial analysis using
classification methods for hybrid models through the proposed framework. The general contribution of this
proposed framework will be used for future work is integrated through the GIS-platform for the safe
management and risk assessment [17], [18] of traffic accident-prone roads classification, to analyze multi-
criteria parameters that influence the results on the traffic accident-prone road classification, to purpose new
parameters of spatial datasets, to enhance a framework of spatial analysis using a hybrid estimation model-
based on a combination of MCDM-ANNs, and to evaluate the enhancement of the new model through the
hybrid. Model evaluation needs to be done to provide best practices for the resulting model [19]. Model
performance assessment is influenced by balanced data to describe the quality of the resulting model, so as
not to lead to misleading conclusions [16]. The proposed framework of classification models with MCDM-
ANNs hybrid to the implementation of prone-roads traffic accident classification and its differences with
existing frameworks are presented of classification models. The selection of a model-based hybrid estimation
on a combination of MCDM-ANNs classification in this proposed framework study is based on a literature
review. The collection of dataset multi-criteria parameter for prone-roads traffic accident classification which
has been used in the paper articles obtained to evaluate the proposed framework of classification models,
explains also the validation and evaluation techniques of the proposed model. Modeling of group analytic
hierarchy process (GAHP) technique to develop weighting technique on multi-parameter criteria applied to
MCDM Methods which still use are a human assumption in weighting, proving through the sensitivity and
stability test of GAHP technique modeling to MCDM methods by comparing the weight was given the
human by manual assumption.
Multi-criteria decision making (MCDM) methods are used in this study to process the determinant
parameter data in the classification of accident-prone areas that include road conditions, traffic volume,
accident rate [20], [21], assign weighting values to each factor based on literature and surveys to expert
sources [22]. From the classification of the accident-prone areas, it becomes crucial to provide
recommendations to the road auditor to conduct a traffic safety audit to obtain assessment criteria,
implementation expenses, the number of involved traffic participants, the effect of road safety, protective
effect, and social factors presenting difficulties [23]. The traffic safety audit is carried out by the
administration of the road auditor by conducting a feasibility study of the network of accident-prone road
categories [24]. MCDM methods have been used for analysis with simple additive weight (SAW), analytical
hierarchy process (AHP), and fuzzy AHP method, used for road safety analysis (RSA) that can help decisions
process in n determining the priority of road management and provide mitigating actions against the most
vulnerable to accidents [25]. The MCDM method with technique for order preference by similarity to ideal
solution (TOPSIS) method is used in the management of road safety, and road safety is one of the factors to
reduce the number of traffic accidents by knowing the position of a road safety study in Bushehr province
Bushehr-Borazjan roads and Borazjan-Genaveh based on various quantitative and qualitative criteria [26].
The MCDM model is one of the right approach models to deal with the problem of accident-prone road
Int J Artif Intell ISSN: 2252-8938
Spatial analysis model for traffic accident-prone roads classification… (Anik Vega Vitianingsih)
section (APRS) because it uses several road and environmental criteria, both quantitative or qualitative;
MCDM is related to the results of decision making for planning that involves stakeholders [27]. A framework
to be proposed through the process of a literature review from several studies that have been done before.
This proses to evaluate the benefits of research that has been done, to know the limitations of the method
used, to identify research gaps that have been conducted, and to advise development for further research to
get the right framework in the research the new [28]. The research questions in research are intended to focus
on the subject area of the study by identifying and classifying the spatial analysis framework for accident-
prone traffic roads to be done [29].
2. RESEARCH METHOD
The spatial analysis model using MCDM is a multi-criteria spatial decision support system (MC-
SDSS) developed in GIS technology by integrating MCDM as a method to determine the best alternative
from the many choices available based on the spatial datasets described [30]. ANNs classification is a data
mining technique in machine learning, mapping various attributes as input layer in a node, adding the hidden
layer, which is then used to get the threshold to the non-linear output layer [31]. The proposed framework
with the steps in Figure 1.
The initial stage a proposed framework in Figure 1 is to plan topics and research trends with
identifying in research needs for the literature review process through state-of-the-art frameworks, methods,
datasets requirements, and gap analysis of existing methods and frameworks. Action adapting, improving,
and hybrid implementation to model accuracy prediction in the development of frameworks. The state-of-the-
art from the literature review within the primary study is displayed in Table 1.
Planning: Research Topics and Trends
State-of-the-art Datasets
State-of-the-art Methods
Gap Analysis of the Existing Methods and Frameworks
State-of-the-art Frameworks
Action: Adapting, Improving, and Hybrid Implementation to
model accuracy prediction in the development of frameworks
Figure 1. Research method steps
The literature review is in Table 1. The research [32] not shown the comparison of the accuracy and
consistency of each method used with the confusion matrix. The meaning of empirical bayes has the best
accuracy and consistency value that is not really visible. The standard deviation of the data distribution value
in the sample data is only used to calculate the disaster-prone traffic accident rate, and there is no proof of the
truth of the model used [33]. Discussion [34] is still limited to the use of an existing method, and knowledge
combination has not been done as a hybrid model approach. The results of the comparison of the two
methods are stated to be more accurate, but no precise accuracy value is given based on the value of the
confusion matrix [35]. On research [36] have not considered the type of road type design, for example,
arterial roads, collector roads, or roads based on their nature (geometric road), there are no studies on
adaptive models that can expand machine learning through a combination of online learning and deep
learning [37]. Paper discussion [38] is still limited to the use of an existing method; knowledge combination
has not been done as a hybrid model approach. The DTR model in conducting the prediction accuracy in this
study is still a macro-level crash count [39]. The model has weaknesses in terms of data simulation because it
requires accident data at the beginning of the calculation [27]. Mathematical modeling in the comparison
algorithm does not exist, so the comparison of results is difficult [40]; there is no evaluation of the models
offered because the test data collected does not have a long-time span [16]. MLP is more accurate for
available spatial datasets but becomes very vulnerable when there is data noise that can cause errors in
predictions [34]. PNN has probabilistic outputs with multilayer perceptron networks, producing fairly
ISSN: 2252-8938
Int J Artif Intell, Vol. 10, No. 2, June 2021: 365 373
368
accurate predictions [34]. RBF is very weak in making predictions [34]. VKT parameters proved to be the
most influential in road traffic accidents, then the V/C variable and driver speed based on the RReliefF
algorithm calculation method [34]. The evaluation to perform the technique, The site consistency test (SCT),
The method consistency test (MCT), The total rank differences test (TRDT), and The total score test (TST)
[38].
Table 1. Literature reviews a framework comparison
Framework
Model and method
Spatial datasets
Results
[32]
Model-based spatial statistical
methods: Poisson regression,
Negative Binomial regression,
Empirical Bayesian.
The accidents, injuries,
and deaths by years
In this study comparing all methods used, where
Empirical Bayes has the best accuracy and
consistency, recommended by the Highway Safety
Manual (HSM) and the European Union Acquis
[33]
Model-based spatial statistical
methods: Kernel density
analysis, Nearest neighbor, K-
function
Intercity accidents,
accidents leading to
injury, accidents leading
to death, and accidents
leading to damages
The observed value curve on the spatial analysis
process, the value of spatial datasets is above the 5%
confidence interval
[36]
Spatial analysis techniques:
Nearest Neighborhood
Hierarchical (NNH) Clustering,
Spatial-Temporal Clustering
Analysis (STAC)
Road accidents
involving all types of
vehicles
The results of the spatial analysis vary according to
the parameter values in the spatial datasets, where is
STAC has a 461,57 higher Prediction Accuracy Index
(PAI) compared to NNH 163,69.
[34]
ANNs techniques: Extreme
learning machine (ELM),
Probabilistic neural network
(PNN), Radial basis function
(RBF), and Multilayer
perceptron (MLP).
V/C, speed, vehicle
kilometer traveled
(VKT), roadway width,
the existence of median,
and allowable/not‐
allowable parking
Evaluation method using Nash Sutcliffe (NS), mean
absolute error (MAE), and root means square error
(RMSE). ELM, as a feed-forward neural network,
becomes the algorithm that has the best performance
and the most accurate prediction results (RMSE
=3,576; NS =0,81; MAE =2,5062) by randomly
selecting hidden nodes using random weights.
[35]
Hot spot analysis (Getis-Ord
Gi*): Network spatial weights,
Kernel Density method
The traffic accident)
Hotspot analysis gives better results because it is
done by considering the weight of spatial datasets
[37]
The support vector machine
combines the techniques of
statistical learning, machine
learning, the neural networks
based: Support vector machine,
Deep neural network
Accident, person,
vehicle, road, and
environment data
They proposed a real-time online deep learning
framework Based on traffic accident black spots.
SVM algorithm in machine learning has 63%
precision and a 61% recall rate in analyzing the black
spots of traffic accidents. If the training data period is
added, the SVM and deep neural network values
increase by 95% and 89% accuracy, 69%, and 79%
recall rates.
[38]
Black spot identification (BSID)
method and Segmentation
method: Empirical Bayesian
(EB), Excess Empirical
Bayesian (EEB), Accident
Frequency (AF), Accident Rate
(AR).
The traffic accident
AF method has the best performance with a
consistency of 93.1% compared to EB 92.2%, and
EEB 77.4%. The performance of the EEB and AR
methods is the weakest in the case of segmentation in
most cases of segmentation.
[39]
Machine learning techniques to
prediction model: Decision tree
regression (DTR) methods
Regression tree framework,
Ensemble techniques. Model
assessment: Average Squared
Error (ASE), Standard
Deviation of Error (SDE)
Statewide Traffic
Analysis Zone (STAZ)
The DTR model to prediction accuracy works better
than the spatial DTR model. To improve prediction
accuracy using ensemble techniques (bagging,
random forest, and gradient boosting) with slightly
better results, depending on the amount of training
data.
[27]
Multicriteria decision making
(MCDM) model: Weighted
linear combination (WLC)
method
The traffic accident
Reports
The model was developed to determine the criteria
weights that have been determined by experts with
interest in subjective results.
[40]
Prediction model: Deep neural
network model, Gene
expression programming (GEP),
Random effect negative
binomial (RENB) models,
Regular negative binomial
model (FENB)
The road geometry,
traffic, and road
environment)
The DNN model experienced an increase in road
prediction with 0.914 (RMSE =7.474) by GEP, and
0.891 (RMSE =8.862). GEP works better than RENB
to measure the ranking of variables that influence
accidents.
[16]
Random effects negative
binomial model: Hierarchical
cluster method
Real time-frequency of
accident data and
contributing factors
The model developed can provide information on the
main causes of accidents at road intersections
Int J Artif Intell ISSN: 2252-8938
Spatial analysis model for traffic accident-prone roads classification… (Anik Vega Vitianingsih)
3. RESULTS AND DISCUSSION
The proposed framework is based on a literature study by comparing the existing framework to
determine the performance of the spatial analysis model offered for traffic accident prone roads in Table 1
and the MCDM-based framework [27], [41]-[44]. The framework being compared includes the method used
for primary study (PS) spatial analysis for accident-prone traffic roads, the spatial analysis model used,
spatial datasets used to test the model through method selection, and the value of the measurement results
through the assessment.
The framework that has been developed by previous researchers will be described in this section.
The framework model [44], was developed to create the Maycock and Hall’s accident prediction model. This
model provides sensitivity analysis on modeling results using multi-objective optimization (MOO) using
multi-criteria decision making for the analytical hierarchy process (AHP model). The needs primary spatial
data sets in the road geometry category, the necessity secondary spatial, i.e., the numbers and types of traffic
accidents, traffic and demand for structural flow, visual distance and vehicle speed, road signs, and
equipment, lighting, driver behavior. The value of the multi-criteria parameters obtained will be done
mathematic spatial data modeling to produce the sensitivity of spatial analysis, the results of multi-criteria
optimization in the form of traffic efficiency (TS), and traffic safety (TS) to the predicted traffic accident.
MOO model is measured using a consistency index (CI) and consistency ratio (CR), the model is proven to
have a good structure with a value of CR≤10, or the CR value is 0.00298; this shows that the MOO model
with MCDM on the AHP model has a consistent value the good one.
The [41] framework was developed by the PROMETREE-RS MCDM model. MCDM is used
because it can use more than one parameter to get the best results from the alternatives produced. This model
was developed to evaluate the DEA and TOPSIS methods in road safety to reduce risk the number of
accidents on the road through the road safety index. The model is tested by using the Robustness of the
composite index. The average correlation value, the average rank value, and the cluster variation average
values will be entered into the MCDM PROMETHEE-RS to test the resulting model. Multi-criteria
parameters tested in this model, i.e., the Police Department data, fatalities, serious injuries, number of
inhabitants, number of registered vehicles, traffic risk, and public risk. This parameter will be used to
mathematic spatial data modeling through DEA and TOPSIS, the produce optimal composite index through
the value of final risk efficiency. DEA-WR provides the best ranking results compared to the DEA-based
composite indicator model (DEA-CI).
The [42], [43] framework is a model built using MCDM. The purpose of this model is to create a
knowledge data mining rule decision tree through FP-growth and apache spark framework. A trial model on
road accident analysis, where the results have a high degree of accuracy and work well to improve road
safety. The multi-parameter criteria used are the road accident data to death and injuries attribute. The testing
model for the relevant association rule is done by testing and validation by measuring quality measurement.
MCDM model involves many criteria, so it is suitable to overcome the problem of accident-prone road
section (APRS) on the type of horizontal alignment, vertical alignment, intersections, significant places, and
shoulder widths with an accuracy value of 0.8830 for threshold values 1 [27].
The proposed framework in previous research will be used by the author as a reference in
developing further activities of the framework that will be proposed. The framework of the research proposed
in Figure 2 has the main differences from the existing framework. Prepare data requirements for spatial
datasets as primary and secondary spatial datasets in determining the road categories to be studied (using a
private or public spatial dataset type). Perform a literature study relating to multi-criteria parameters used on
each road category. Mathematics modeling for spatial analysis to the proposed framework for hybrid
estimation model-based on a combination of MCDM-ANNs multi-class classification. In this case, the pre-
processing data process will run for the classification analysis process. The range of classification will be
performed through mathematical modeling using the Guttman method. The results of the multi-class
classification will be validated with SCT, MCT, and ARC validation. Focuses on the propose a classification
of roads prone to accidents using multiple criteria parameters (data series), make modeling of road prone to
accidents by calculating the value of traffic accident by type of events and the index of the accidents, the
density that of roads traffic accident happened to each zone and the amount of data in each year, risk factors
based on the severity of the accidents, severity of roads traffic accident events, crash prediction models using
data series, and the value of the societal cost of each type the accident. The ANN strategy has the most
noteworthy rating of techniques that are regularly utilized in the literature review in essential considers. The
empirical Bayes method and decision tree in data mining are also broadly used within the clustering category
in spatial information modeling of accident-prone zones. This considers a proposed framework of
classification used a hybrid estimation model based on a combination of MCDM-ANN classification. Test
the consistency of the method from the model produced with the MCT, SCT, and the value of ARC model
evaluations. ANNs classification methods are the most popular data mining techniques in the field spatial
ISSN: 2252-8938
Int J Artif Intell, Vol. 10, No. 2, June 2021: 365 373
370
analysis of accident-prone roads and the factors that affect the accident rate, among others (neural networks,
extreme learning machines, k-nearest neighbor, naive bayes, decision trees) [45], [31].
Proposed Hybrid of MCDM-ANNs Classification Framework
Preparation Spatial
Datasets
Method Classification
for MCDM Model
Mathematic
Modelling
Framework
System
Multi-Class
Classification
Testing & Validation :
- Precision, Recall, Accucary (ARC)
- Method Consistency Test (MCT)
- Site Consistency Test (SCT)
Multi-Criteria
Parameter to
Spatial Datasets
Data Pre-Processing
Classification
Analysis
Multi-Class
Classification (1 to n)
Range Classification
Method
true
false
MCDM Hybrid to
Artificial Neural
Network (ANN)
Classification Determine the
ranking value to
classify
Primary & Secondary
Road Network:
- Arterial road
- Collector road
- Local road
Secondary Spatial Datasets:
- the geometric road
- the pavement road
- the environmental road
Identify assessment
to secondary spatial
datasets
Figure 2. Proposed hybrid of MCDM-ANNs classification framework to evaluate and rank spatial analysis
model traffic accident prone roads
4. CONCLUSION
The proposed framework in this study will act as a model-based hybrid estimation approach on a
combination of MCDM-ANNs classification to strengthen data mining techniques in spatial multi-criteria
analysis in multi-class classification decision making. In the literature review on the primary study, there are
no research topics that discuss on the traffic accident-prone roads classification on the arterial road, collector
road, and type of road based on its nature (pavement, geometry, and local road) categories. The spatial
analysis model using MCDM among others, analytic hierarchy process (AHP), analytical network process
(ANP), weighted sum model (WSM), weighted product (WP), weight product model (WPM), simple additive
weighting (SAW), technique for order preference by similarity to ideal solution (TOPSIS), preference
ranking organization method for enrichment of evaluations (PROMETHEE), multi-attribute utility theory
(MAUT), elimination and choice expressing reality (ELECTRE), and vlsekriterijuska optimizacija i
komoromisno resenje (VIKOR). The results of the best methods through APR measurement will be a
reference in decision making in road management. Existing research is still limited to one type of road used
as an object (specific region), and 96% is used private spatial datasets. In this study, it was using an Inductive
qualitative approach in the modeling of road prone to accidents to identify the findings of science that is done
during the research process. The proposed a classification of roads prone to accidents using multiple criteria
parameters, make a modeling of road prone to accidents calculating by the value of traffic accident by type of
events and the index of the accidents, the value of the density that of roads traffic accident happened to each
zone and the amount of data in each year, the value of risk factors based on the severity of the accidents, the
value of severity of roads traffic accident events, the value of crash prediction models, the value of the
societal cost of each type the accident, and the test result is using the method the SCT, the MCT, and APR.
Int J Artif Intell ISSN: 2252-8938
Spatial analysis model for traffic accident-prone roads classification… (Anik Vega Vitianingsih)
ACKNOWLEDGEMENTS
This research is supported by Universiti Teknikal Malaysia Melaka, Malaysia, and Universitas Dr.
Soetomo, Indonesia. The development of results of a study funded by The Directorate General of
Strengthening Research and Development of Research, Technology, and Higher Education Ministry-
Indonesia in 2015-2016.
REFERENCES
[1] J. Albrecht, “GIS Project Management,” in Comprehensive Geographic Information Systems, Elsevier Inc., 2018,
pp. 446477.
[2] A. Banerjee and S. Ray, “Spatial models and geographic information systems,” Encyclopedia of Ecology, 2nd
Edition. Elsevier Inc., pp. 110, 2018.
[3] K. E. Brassel and R. Weibel, “A review and conceptual framework of automated map generalization,” Int. J.
Geogr. Inf. Syst., vol. 2, no. 3, pp. 229244, 1988.
[4] S. Wang, Y. Zhong, and E. Wang, “An integrated GIS platform architecture for spatiotemporal big data,” Futur.
Gener. Comput. Syst., vol. 94, no. May, pp. 160172, 2019.
[5] J. Lacasta, F. J. Lopez-Pellicer, B. Espejo-García, J. Nogueras-Iso, and F. J. Zarazaga-Soria, “Aggregation-based
information retrieval system for geospatial data catalogs,” Int. J. Geogr. Inf. Sci., vol. 31, no. 8, pp. 15831605,
2017.
[6] D. Li, S. Wang, H. Yuan, and D. Li, “Software and applications of spatial data mining,” Wiley Interdiscip. Rev.
Data Min. Knowl. Discov., vol. 6, no. 3, pp. 84114, 2016.
[7] L. Zhao, L. Chen, R. Ranjan, K. K. R. Choo, and J. He, “Geographical information system parallelization for
spatial big data processing: a review,” Cluster Comput., vol. 19, no. 1, pp. 139152, 2016.
[8] N. F. Hordri, A. Samar, S. S. Yuhaniz, and S. M. Shamsuddin, “A systematic literature review on features of deep
learning in big data analytics,” Int. J. Adv. Soft Comput. its Appl., vol. 9, no. 1, pp. 3249, 2017.
[9] M. Effati and A. Sadeghi-Niaraki, “A semantic-based classification and regression tree approach for modelling
complex spatial rules in motor vehicle crashes domain,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 5,
no. 4, pp. 181194, 2015.
[10] M. A. Raihan, M. Hossain, and T. Hasan, “Data mining in road crash analysis: the context of developing
countries,” Int. J. Inj. Contr. Saf. Promot., vol. 25, no. 1, pp. 4152, 2018.
[11] T. K. Bahiru, D. Kumar Singh, and E. A. Tessfaw, “Comparative Study on Data Mining Classification Algorithms
for Predicting Road Traffic Accident Severity,” in 2018 Second International Conference on Inventive
Communication and Computational Technologies (ICICCT), 2018, pp. 16551660.
[12] Z. Zheng, P. Lu, and D. Tolliver, “Decision Tree Approach to Accident Prediction for HighwayRail Grade
Crossings,” Transp. Res. Rec. J. Transp. Res. Board, vol. 2545, no. 1, pp. 115122, 2016.
[13] A. S. Lee, W. H. Lin, G. S. Gill, and W. Cheng, “An enhanced empirical bayesian method for identifying road
hotspots and predicting number of crashes,” Journal of Transportation Safety and Security, vol. 0, no. 0, Taylor &
Francis, pp. 117, 2018.
[14] W. Chen et al., “A novel fuzzy deep-learning approach to traffic flow prediction with uncertain spatialtemporal
data features,” Futur. Gener. Comput. Syst., vol. 89, no. June, pp. 7888, 2018.
[15] M. Kibanov, M. Becker, M. Atzmueller, and A. Hotho, “Adaptive kNN Using Expected Accuracy for
Classification of Geo-Spatial Data,” in Proceedings of the 33rd Annual ACM Symposium on Applied Computing
Pages 857-865, 2018, pp. 857865.
[16] M. Schlögl, “A multivariate analysis of environmental effects on road accident occurrence using a balanced
bagging approach,” Accid. Anal. Prev., vol. 136, no. March, pp. 112, 2020.
[17] W. Li and S. Wang, “PolarGlobe: A web-wide virtual globe system for visualizing multidimensional, time-varying,
big climate data,” Int. J. Geogr. Inf. Sci., vol. 31, no. 8, pp. 15621582, 2017.
[18] M. P. Repetto, M. Burlando, G. Solari, P. De Gaetano, M. Pizzo, and M. Tizzi, “A web-based GIS platform for the
safe management and risk assessment of complex structural and infrastructural systems exposed to wind,” Adv.
Eng. Softw., vol. 117, pp. 2945, 2018.
[19] H. Gong, F. Wang, B. Brenda, and S. Dent, “Application of random effects negative binomial model with clustered
dataset for vehicle crash frequency analysis,” Int. J. Transp. Sci. Technol., no. April, pp. 112, 2020.
[20] T. Sipos, “Spatial statistical analysis of the traffic accidents,” Period. Polytech. Transp. Eng., vol. 45, no. 2, pp.
101105, 2017.
[21] A. V. Vitianingsih and D. Cahyono, “Geographical Information System for Mapping Road Using Multi-Attribute
Utility Method,” in International Conference on Science and Technology-Computer (ICST), 2016, pp. 04.
[22] R. Al-Ruzouq, K. Hamad, S. Abu Dabous, W. Zeiada, M. A. Khalil, and T. Voigt, “Weighted Multi-attribute
Framework to Identify Freeway Incident Hot Spots in a Spatiotemporal Context,” Arab. J. Sci. Eng., vol. 44, no.
10, pp. 82058223, 2019.
[23] Á. Török, “Statistical Analysis of a Multi-Criteria Assessment of Intelligent Traffic Systems for the Improvement
of Road Safety,” J. Financ. Econ., vol. 4, no. 5, pp. 127135, 2016.
[24] Y. Huvarinen, E. Svatkova, E. Oleshchenko, and S. Pushchina, “Road Safety Audit,” in Transportation Research
Procedia, 2017, vol. 20, no. September 2016, pp. 236241.
[25] S. Kanuganti, R. Agarwala, B. Dutta, P. N. Bhanegaonkar, A. P. Singh, and A. K. Sarkar, “Road safety analysis
using multi criteria approach: A case study in India,” in Transportation Research Procedia, 2017, vol. 25, pp.
ISSN: 2252-8938
Int J Artif Intell, Vol. 10, No. 2, June 2021: 365 373
372
46535665.
[26] M. S. Fatemeh Haghighat, “Application of a Multi-Criteria Approach To Road Safety Evaluation in the Bushehr
Province , Iran,” Traffic Plan. Prelim. Commun., vol. 23, no. 5, pp. 341352, 2011.
[27] F. Yakar, “A multicriteria decision making–based methodology to identify accident-prone road sections,” J.
Transp. Saf. Secur., pp. 115, 2019.
[28] B. Kitchenham and S. Charters, “Guidelines for performing Systematic Literature Reviews in Software
Engineering,” Engineering, vol. 2, p. 1051, 2007.
[29] M. Vierhauser, R. Rabiser, and P. Grünbacher, “Requirements monitoring frameworks: A systematic review,” Inf.
Softw. Technol., vol. 80, pp. 89109, 2016.
[30] S. M. Ghavami, “Multi-criteria spatial decision support system for identifying strategic roads in disaster situations,”
Int. J. Crit. Infrastruct. Prot., vol. 24, pp. 2336, 2019.
[31] N. Sang and M. Aitkenhead, “Data Mining, Machine Learning and Spatial Data Infrastructures for Scenario
Modelling,” in Modelling Nature-based Solutions, 2020, pp. 276304.
[32] M. A. Dereli and S. Erdogan, “A new model for determining the traffic accident black spots using GIS-aided spatial
statistical methods,” Transp. Res. Part A Policy Pract., vol. 103, no. September, pp. 106117, 2017.
[33] G. A. Shafabakhsh, A. Famili, and M. S. Bahadori, “GIS-based spatial analysis of urban traffic accidents: Case
study in Mashhad, Iran,” J. Traffic Transp. Eng., vol. 4, no. 3, pp. 290299, 2017.
[34] H. Behbahani and A. Mohamadian, “Forecasting accident frequency of an urban road network : A comparison of
four artificial neural network techniques,” J. Forecast., vol. 37, no. 7, pp. 767780, 2018.
[35] H. E. Colak, T. Memisoglu, Y. S. Erbas, and S. Bediroglu, “Hot spot analysis based on network spatial weights to
determine spatial statistics of traffic accidents in Rize, Turkey,” Arab. J. Geosci., vol. 11, no. 151, pp. 111, 2018.
[36] S. S. R. Shariff, H. A. Maad, N. N. A. Halim, and Z. Derasit, “Determining hotspots of road accidents using spatial
analysis,” Indones. J. Electr. Eng. Comput. Sci., vol. 9, no. 1, pp. 146151, 2018.
[37] Z. Fan, C. Liu, D. Cai, and S. Yue, “Research on black spot identification of safety in urban traffic accidents based
on machine learning method,” Saf. Sci., vol. 118, no. April, pp. 607616, 2019.
[38] M. Ghadi and Á. Török, “A comparative analysis of black spot identification methods and road accident
segmentation methods,” Accid. Anal. Prev., vol. 128, no. February, pp. 17, 2019.
[39] M. S. Rahman, M. Abdel-Aty, S. Hasan, and Q. Cai, “Applying machine learning approaches to analyze the
vulnerable road-users’ crashes at statewide traffic analysis zones,” J. Safety Res., vol. 70, no. September, pp. 275
288, 2019.
[40] G. Singh, M. Pal, Y. Yadav, and T. Singla, “Deep neural network-based predictive modeling of road accidents,
Neural Comput. Appl., vol. 32, no. 16, pp. 1241712426, 2020.
[41] M. Rosić, D. Pešić, D. Kukić, B. Antić, and M. Božović, “Method for selection of optimal road safety composite
index with examples from DEA and TOPSIS method,” Accid. Anal. Prev., vol. 98, no. January, pp. 277286, 2017.
[42] A. Ait-Mlouk, F. Gharnati, and T. Agouti, “An improved approach for association rule mining using a multi-criteria
decision support system: a case study in road safety,” Eur. Transp. Res. Rev., vol. 9:40, no. September, pp. 113,
2017.
[43] A. Ait-Mlouk, T. Agouti, and F. Gharnati, “Mining and prioritization of association rules for big data: multi-criteria
decision analysis approach,” J. Big Data, vol. 4, no. 1, pp. 121, 2017.
[44] H. Pilko, S. Mandžuka, and D. Barić, “Urban single-lane roundabouts: A new analytical approach using multi-
criteria and simultaneous multi-objective optimization of geometry design, efficiency and safety,” Transp. Res.
Part C Emerg. Technol., vol. 80, no. July, pp. 257271, 2017.
[45] V. Rovšek, M. Batista, and B. Bogunović, “Identifying the key risk factors of traffic accident injury severity on
Slovenian roads using a non-parametric classification tree,” Transport, vol. 32, no. 3, pp. 272281, 2017.
BIOGRAPHIES OF AUTHORS
Anik Vega Vitianingsih. A bachelor's degree in Informatics Engineering in 2004 and a Master's
Degree in Game Tech was obtained in 2011. The author is a Permanent Lecturer in the
Informatics Department, editor in chief of the International Journal of Artificial Intelligence and
Robotics Universitas Dr. Soetomo, and students in the Ph.D. at the Faculty of Information and
Communication Technology (FTMK), Universiti Teknikal Malaysia Melaka, Malaysia. The field
of interest in Spatial Analysis, and Spatial Data Modeling, Artificial Intelligence in Geographical
Information Systems. Experiences in writing papers according to their fields in the Scopus
Journal include 2019-International Journal of Intelligent Engineering and Systems, 2019-Data in
Brief, 2018-International Journal of Engineering and Technology (UAE), 2018-Journal of
Telecommunication, Electronic and Computer Engineering. The author has been a reviewer for
Taylor and Francis Ltd publishers, including the Journal of Transportation Safety and Security,
IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India),
and International Journal of Injury Control and Safety Promotion, as well as a reviewer on the
International Social Science Journal Publisher Wiley-Blackwell Publishing Ltd.
Int J Artif Intell ISSN: 2252-8938
Spatial analysis model for traffic accident-prone roads classification… (Anik Vega Vitianingsih)
Prof. Dr. Nanna Suryana Herman. Professor at the Faculty of Information and
Communication Technology (FTMK), Universiti Teknikal Malaysia Melaka, Malaysia.
Bachelor's degree in Soil and Water Engineering from Padjadjaran University, Bandung,
Indonesia. Master's degree in Computer Assisted Regional Planning at the International Institute
for Geoinformatics and Earth Observation (ITC), Enschede, The Netherlands. Doctoral Degree
in Department of Remote sensing and GIS, Research University of Wageningen, Holland. His
research interests are spatial data analytics, image processing, and spatial modeling, and remote
sensing. Active as the Editorial Board of International Journals, member of The ASEAN
European Academic University Network (ASEA-UNINET), and EURAS-Eurasian Universities
Union.
Zahriah Othman. Lecturer at the Software Engineering department, Faculty of Information and
Communication Technology, Universiti Teknikal Malaysia Melaka. Bachelor's degree in
Information Technology from Universiti Utara Malaysia in 2001. In 2003 he received a master
of science degree in Software Engineering from the School of Informatics, Department of
Computing, and University of Bradford, United Kingdom. Ph.D. in Computer Science from
Universiti Teknikal Malaysia Melaka, Malaysia. His areas of research interest are Software
Engineering, Artificial Intelligence, and information retrieval, specifically on terminology
disagreement in retrieving geospatial data.
... Nevertheless, the generalizability was limited to real-world scenarios. Vitianingsih et al. (2021) presented a technique of spatial analysis system for traffic accident-prone road classification centered on a hybrid estimation system. A combination of Multi-Criteria Decision-Making (MCDM) and Artificial Neural Networks (ANNs) was deployed. ...
... These advancements render a more accurate understanding of accidentprone areas (96.55%) and enable better route selection Table 7. Table 4 indicates the comparative analysis of the proposed work and the related works. The existing models like (Karamanlis et al. 2023a, b), (Fiorentini and Losa 2020b), (Tanprasert et al. 2020), (Santos et al. 2021), (Vitianingsih et al. 2021), (Manap et al. 2021), and (Mishra et al. 2023) developed diverse methodologies like Self-supervised ANN, LR, RF, K-Nearest Neighbor, NB, Distance-aware pixel accumulation, DT, MCDM-ANN, Getis-Ord Gi* statistic, and CNN. These existing approaches ensured effective accident BS identification based on BS identification on Greek road networks, Long-term-based road BS screening, Recognizing traffic BS, Traffic accident analysis for hotspot prediction, Traffic accident-prone roads classificationfor BS identification, Hotspot segments with a risk of heavy-vehicle accidents, and Proactive driving and accident prevention around accident hotspots. ...
... These existing approaches ensured effective accident BS identification based on BS identification on Greek road networks, Long-term-based road BS screening, Recognizing traffic BS, Traffic accident analysis for hotspot prediction, Traffic accident-prone roads classificationfor BS identification, Hotspot segments with a risk of heavy-vehicle accidents, and Proactive driving and accident prevention around accident hotspots. Yet, the existing (Karamanlis et al. 2023a, b), (Fiorentini and Losa 2020b), (Tanprasert et al. 2020), (Santos et al. 2021), (Vitianingsih et al. 2021), (Manap et al. 2021), and (Mishra et al. 2023) models obtained an interpretability, and high computational complexity during BSs analysis. However, the proposed system developed a BARS-RAGWG framework for significantly detecting the accident BSs. ...
Article
Full-text available
Vehicular traffic re-routing plays an essential role to improve mobility and mitigate accident Blackspots (BSs). In this paper, an effective technique for Blackspot Analysis and Re-route Selection utilizing Radial basis levenberg–marquardtcontext network And Good point Weighted beluga whale optimizationwith Geographic information systems (BARS-RAGWG) is proposed. This proposed technique includes some key steps. Initially, the source and destination of vehicle users are determined, and by utilizing Geographic Information Systems (GIS), routes between them are identified. Then, the data undergoes BS analysis in which accident BSs from the dataset are increased utilizing the Zeroth Order Generative Adversarial Networks (Zo-GAN) to augment the training dataset's size. Next, utilizing the K-Means Clustering (KMC) algorithm, the data is clustered based on road types. Subsequently, by using a Bayesian technique, the safety index is calculated centered on the number of accidents, types of highways, and assigned weight values. For evaluating locations as BSs, grade assignment is done utilizing fuzzy rules. By using the Radial Basis Levenberg–Marquardt Context Network (RBLMCN), the accident BSs are classified. Lastly, by utilizing Good Point Weighted Beluga Whale Optimization (GPWBWO), optimal routes are selected. As per the experimental result, the proposed approach withstands maximum accuracy than prevailing approaches.
... Proses ini melibatkan konversi data manual atau non-digital menjadi format digital yang kompatibel dengan perangkat lunak GIS, seperti shapefile untuk data spasial dan CSV untuk data non-spasial. Dalam penelitian ini, data spasial berupa peta administrasi Surakarta dikombinasikan dengan data non-spasial, seperti jumlah perceraian berdasarkan penyebabnya, menggunakan fitur join attribute di perangkat lunak QGIS (Vitianingsih et al., 2021). ...
Article
Full-text available
This study aims to utilize QGIS as a spatial analysis tool to map the distribution of divorces based on economic factors and disputes in Surakarta City during the 2020–2023 period. The data used includes spatial data in the form of Surakarta City's administrative map in shapefile format and non-spatial data comprising the number of divorces obtained from BPS Surakarta. Non-spatial data were integrated into spatial data using the "join attribute" feature in QGIS. The analysis process was conducted using classification methods to identify areas with the highest divorce density. The findings reveal that divorces due to economic factors are concentrated in low-income areas, such as Banjarsari and Jebres, while divorces caused by disputes exhibit a more evenly distributed pattern. The thematic maps were then exported into GeoJSON format for implementation on an interactive website accessible to the public and policymakers. This study contributes to the utilization of GIS technology in supporting data-driven decision-making.
... In recent years, negative binomial regression proves to be a suitable statistical method for analyzing count data and has been adopted by several safety researchers to establish correlations between road vehicle crashes and geometric properties of road segments, such as lane width, shoulder width, horizontal curvature, and other traffic-related variables ( Haq et al., 2020;Li et al., 2020;Guo et al., 2021;Iqbal et al., 2021;Shaik et al., 2021). It is an extension of Poisson regression, sharing the same mean structure while introducing an additional parameter to account for over-dispersion and has inherent characteristics of crashes, which are random, discrete, and non-negative in nature (Rezapour et al., 2020;Zhang et al., 2020;Vitianingsih et al., 2021). When the conditional distribution of the outcome variable exhibits overdispersion, negative binomial regression yields narrower confidence intervals compared to those obtained from a Poisson regression model. ...
Article
Full-text available
ABSTRACT Highway safety is a critical concern for transportation authorities worldwide, with geometric characteristics playing a pivotal role in mitigating or exacerbating the risks. This study explores the impact of geometric features of a two lane highway, Akure- Ilesha road, South-West, Nigeria on traffic operation and safety. Accident data was obtained for a period of 10 years (2013 to 2022) from the Federal Road Safety Corps (FRSC) of Nigeria on the road under study. Quickbird satellite imagery and civil3D were used to extract the highway geometric alignment indices which were compared to standards, and the traffic operations were determined on site using manually counting and stopwatch approach. Negative Binomial Regression Model (NBRM) was also used as the statistical model to analyse the relationship between geometric characteristics, traffic operation and highway safety. The result shows that geometric properties which include pavement width, shoulder width and horizontal curve radius of road have significant effect on highway crash occurrence and traffic operations along the highway. In addition, about 80 to 90% of the geometric parameters including shoulder width, shoulder and carriageway cambering fells below established standards. The knowledge attained from this study will benefit transportation planners, engineers, and policymakers to implement effective measures aimed at reducing the crash frequency, thereby enhancing overall transportation safety and efficiency. Keywords: Geometric Characteristics, Highway, Negative Binomial Regression, Traffic Operation, Traffic Safety.
... Additionally, based on the proposed spatial analysis framework, a hybrid assessment model using a combination of multi-criteria decisionmaking (MCDM) and artificial neural network (ANN) classification (MCDM-ANN) was applied. This model was used for future work on the classification of traffic accident-prone roads through testing with a set of spatial data (Vitianingsih et al., 2021). The research in Nairobi, Kenya, carried out by multi-criteria decision-making on the basis of three experts, showed the selection of the most effective strategy for solving the problem of traffic safety on roads in Kenya (Bouraima et al., 2022). ...
Article
Full-text available
This paper introduces a novel Integrated Interval Rough Pivot Pairwise Relative Criteria Importance Assessment(IRN PIPRECIA) model combined with Interval Rough Combined Compromise Solution (IRN CoCoSo), marking asignificant advancement in sustainable traffic flow management for commercial vehicles. This innovative mergeris a first in literature, methodologically enhancing the evaluation of road sections based on critical parametersincluding passenger car equivalent (PCE) 85%, AADT, road conditions, and accident data. Our model system-atically fills the research gap in holistic traffic performance analysis, providing a unique tool for prioritizing roadsafety and efficiency. The key scientific contribution is the model’s ability to integrate causal and consequentialtraffic factors into a single framework, offering a novel multi-criteria decision-making (MCDM) approach. Re-sults, validated through various verification models, show the integrated model’s effectiveness in real-worldscenarios, confirming its robustness and stability. With strong engineering application potential, our worksupports urban planners and traffic managers in making informed, sustainable decisions. The model extendsbeyond traditional traffic analysis, promising a shift towards more adaptive, data-driven infrastructure man-agement. Future research will aim to refine the criteria basis and explore real-time decision-making throughadvanced MCDM applications.
... The study provides a scientific basis for the application of artificial intelligence technology to determine the professional adaptive capabilities of construction management staff. If we base the development of an artificial intelligence information system for multi-value classification on the results of youth vocational guidance tests, it will improve the diagnosis of professional selection and will be able to provide recommendations for improving the productivity of professional implementation [2]. The level of trust of young people in the artificial intelligence when it comes to the questions of career guidance, choosing  ISSN: 2252-8938 Int J Artif Intell, Vol. 12, No. 2, June 2023: 593-601 594 a professional direction or profession according to their wishes, aptitudes, capabilities, and so on, is then determined. ...
Article
Full-text available
This study is devoted to solving the problem to determine the professional adaptive capabilities of construction management staff using artificial intelligence systems. It is proposed fully connected feed-forward neural network (FCF-FNN) architecture and performed empirical modeling to create a data set. Model of artificial intelligence system allows evaluating the processes in an FCF-FNN during the execution of multi-value classification of professional areas. A method has been developed for the training process of a machine learning model, which reflects the internal connections between the components of an artificial intelligence system that allow it to “learn” from training data. To train the neural network, a data set of 35 input parameters and 29 output parameters was used; the amount of data in the set is 936 data lines. Neural network training occurred in the proportion of 10% and 90%, respectively. Results of this study research can be used to further improve the knowledge and skills necessary for successful professional realization.
... The main causes of traffic accidents are generally caused by human factors, but road conditions and the environment also contribute to these accidents. Thus, it is necessary to know which roads are prone to accidents by conducting road investigations [6]. ...
Article
Full-text available
This study was conducted to investigate accident-prone roads in Palu City. The method used is direct observation on the road which shows the frequent occurrence of traffic accidents based on traffic accident data obtained from the Municipal Police Offices ( Polres ). Based on the results of the study showed that 19 roads often occur traffic accidents. R.E Martadinata Sub Districts Tondo (25.07%), Trans Sulawesi Sub Districts Taipa (11.21%), and Trans Sulawesi Sub Districts Mamboro (7.08%) are the most accident-prone roads in Palu City. These three roads are included in the national road network as the primary arterial roads of Palu City. The final result of this study shows that accident-prone roads can be caused by poor pavement road conditions, high side frictions, and poor environmental conditions and visibility.
... It was the first black-spot classification technology that was fully environment-aware. Vitianingsih et al. [43] presented a framework of spatial analysis using a hybrid estimation model based on a combination of multi-criteria decision making (MCDM) and artificial neural network (ANN) (MCDM-ANN) classification. This model is useful for traffic-accident-prone road classification with a spatial dataset. ...
Article
Full-text available
This study focuses on identifying accident-prone areas and analyzing the factors contributing to the distribution of traffic accidents near highway ramps. A combined method of kernel density estimation, spatial autocorrelation analysis, and multivariate logistic regression analysis helped to identify accident hotspots. Through data collection and analysis, the clustering characteristics of traffic accidents in the diversion and merging areas were identified. Four levels of accident-prone areas were divided according to the accident rates. The factors influencing the spatial distribution of accidents were analyzed. The results showed that traffic accidents in the diversion area were concentrated near the exit, but the accidents in merging areas had a wider range of distribution. The analysis of this phenomenon was conducted using the multinomial logit model results. The important factors of different accident-prone areas were clarified. The temperature, the accident lane, weather conditions, and the time of day had significant impacts on the spatial distribution of traffic accidents. The study’s findings provide an important decision-making basis for highway accident prevention management.
Article
A crucial step in the data science pipeline, feature engineering has a big impact on how well predictive models function. This study explores several feature engineering techniques and how they affect the robustness and accuracy of models. In order to extract useful information from unprocessed data and improve the prediction capability of machine learning models, we study a variety of techniques, from straightforward transformations to cutting-edge approaches. The study starts by investigating basic methods including data scaling, one-hot encoding, and handling missing values. Then, we go on to more complex techniques like feature selection, dimensionality reduction, and interaction term creation. We also explore the possibilities for domain-specific feature engineering, which entails designing features specifically for the issue domain and utilising additional data sources to expand the feature space. We run extensive experiments on numerous datasets including different sectors, such as healthcare, finance, and natural language processing, in order to evaluate the efficacy of these methodologies. We evaluate model performance using metrics like recall, accuracy, precision, and F1-score to get a comprehensive picture of how feature engineering affects various predictive tasks. This study also assesses the computational expense related to each feature engineering technique, taking scalability and efficiency in practical applications into account. To assist practitioners in making wise choices during feature engineering, we address the trade-offs between model complexity and performance enhancements. Our results highlight the importance of feature engineering in data science and demonstrate how it may significantly improve prediction models in a variety of fields. This study is a useful tool for data scientists because it emphasises the significance of careful feature engineering as a foundation for creating reliable and accurate prediction models.
Article
Full-text available
One of the major challenges with vehicle crash frequency studies is how to deal with the unobserved heterogeneity in crash data. While statistical models of crash frequency analysis based on single probability distributions are constantly improving, several researchers discovered that multiple distribution models may better describe crash frequency data and capture more unobserved heterogeneity. Based on the hypothesis that total crash counts occurring at an intersection are affected by unique sets of factors, this research proposes a two-step approach to studying the contributing factors to crashes at intersections in the Mississippi coastal area. In this study, the data of single crash accidents are first clustered into subgroups using a hierarchical clustering method, and then a Random Effects Negative Binomial model is applied to each subgroup with crash counts at an intersection as observations. A model with no data clustering is also estimated to serve as the comparison benchmark. The analysis results show that this two-step approach can reveal more information about crash contributing factors and have improved the predictive power and goodness of fit.
Article
Full-text available
This work proposes to use deep neural networks (DNN) model for prediction of road accidents. DNN consists of two or more hidden layers with large number of nodes. Accident data of non-urban sections of eight highways were collected from official records, and dataset consists of a total of 2680 accidents. The data of 16 explanatory variables related to road geometry, traffic and road environment were collected from official records as well as through field studies. Out of a total of 222 data points of accident frequency, 148 were used for training and remaining 74 to test the models. To compare the performance of DNN-based modeling approach, gene expression programming (GEP) and random effect negative binomial (RENB) models were used. A correlation coefficient value of 0.945 (root mean square error = 5.908) was achieved by DNN in comparison with 0.914 (RMSE = 7.474) by GEP, and 0.891 (RMSE = 8.862) by RENB with the test dataset, indicating an improved performance by DNN in prediction of road accidents. In comparison with DNN, though lower value of correlation coefficient was achieved by GEP model, it quantified the effects of various variables on accident frequency and provided a ranked list of variables based upon their importance.
Article
Full-text available
Determining and understanding the environmental factors contributing to road traffic accident occurrence is of core importance in road safety research. In this study, a methodology to obtain robust and unbiased results when modeling imbalanced, high-resolution accident data is described. Based on a data set covering the whole highway network of Austria in a fine spatial (250 m) and temporal (1 h) scale, the effects of 48 covariates on accident occurrence are analyzed, with a special emphasis on real-time weather variables obtained through meteorological re-analysis. A balanced bagging approach is employed to cope with the issue of class imbalance. By fitting different tree-based classifiers to a large number of bootstrapped training samples, ensembles of binary classification models are established. The final prediction is achieved through majority vote across each ensemble, resulting in a robust prediction with reduced variance. Findings show the merits of the proposed approach in terms of model quality and robustness of the results, consistently displaying accuracies around 80% while exhibiting sensitivities of approximately 50%. In addition to certain features related to roadway geometrics, surface condition and traffic volume, a number of weather variables are found to be of importance for predicting accident occurrence. The proposed methodological take may not only pave the way for further analyses of high-resolution road safety data including real-time information, but can also be transferred to any other imbalanced classification problem.
Article
Full-text available
Road transportation network has an important role in the management of the disaster situations. Despite it is vulnerable to the disasters; it helps to provide emergency responses to the disaster management practices. Therefore, it is required to identify the most important roads in the network in order to support decision-makers to make appropriate decisions about the roads. This paper introduces an integrated methodology to evaluate the transportation network performance (TNP) in disaster situations by developing a multi-criteria spatial decision support system (MC-SDSS). The developed MC-SDSS is a fully integrated system of Geospatial Information System (GIS) and Multi-Criteria Decision Making (MCDM) methods. On the one hand, GIS functionalities are used for storing the data, performing the analyses in order to produce the required criteria and displaying the results. On the other hand, AHP as a well-known MCDM method is used to receive priorities and preferences of decision-makers about the criteria. Based on the decision-making model (intelligent, design, and choice), four criteria are selected as indicators for evaluating the TNP in disaster situations: capacity, accessibility, vulnerability, and importance criteria. In this regard, criteria maps are generated by GIS tools, the experts' preferences about the criteria are acquired by AHP comparison matrix, and a ranking of the roads are prepared and visualized on the MC-SDSS. Finally, by utilizing the One-At-Time approach as the sensitivity analysis method, MC-SDSS tries to determine the robustness of the results due to the variation or uncertainty resulting from changing the important scales of the criteria in the AHP pairwise comparison matrix. The results show that about 9, 33, 20, and 38 percent of the roads are very high, high, moderate, and low strategic in the case study (Mazandaran province, Iran) respectively. It also shows that capacity/accessibility pairwise comparison is the most sensitive comparison in the AHP comparison matrix.
Article
With the rapid development of economy and urbanization, the number of urban motor vehicles keeps increasing. Urban travel is more convenient, but the traffic safety problems are increasingly prominent. Traffic accident data include not only time and place, but also people, roads, vehicles and the surrounding environment. Traffic accident black spot is the spatial location of traffic accident concentrated distribution. Most of the traditional traffic accident black spot identification only considers time and space factors, ignoring other factors. Based on the traffic accident data of Suzhou Industrial Park, this paper makes a fusion analysis of the multi-source influencing factors involved in traffic accident black spot. According to the structured association characteristics of urban traffic accident big data, a support vector machine method based on maximizing the classification interval is used to train the complex model and optimal learning of accident black spots in the study area. The accuracy of black spot identification is improved. At the same time, aiming at the rapid growth of traffic accident multi-source data, a black point identification algorithm based on deep neural network is proposed. The deep neural network of relevant data category information is established to verify the model's ability to identify accident black spots. A feature-based black spot identification method based on depth neural network is proposed. Furthermore, a dynamic adaptive machine learning architecture is built.
Article
Accident-prone road section (APRS) treatment is the most effective strategy for accident reduction. Because use of multicriteria decision making (MCDM) in the decision process improves the quality of the decision, especially in problems involving multiple criteria, MCDM approach may be the appropriate approach for APRS determination. In this study, an MCDM-based methodology was developed for the determination of APRSs. Weighted linear combination method was used to combine criteria. Because use of experts' opinion for criteria weight determination in MCDM studies may bring subjective results, all possible criteria weight combinations (with some simplifications) were tried to determine the best weight combination. “Relative risk” was used as common unit for criteria standardization and past accident data is utilized to express the “relative risk”. The methodology was tested on D850 State Highway, Tokat, Turkey. Five criteria (horizontal alignment, vertical alignment, intersections, significant places, shoulder width) were used in the process. At the end, although 63 out of 94 sections (67.02%) were labeled as “accident prone”, 173 out of 206 accidents (83.98%) take place in these sections. Therefore, the methodology can be used for APRS determination. Also, the effect of any change in the road properties on the accident-proneness can be simulated by simply changing the criterion values.
Article
Introduction: In this paper, we present machine learning techniques to analyze pedestrian and bicycle crash by developing macro-level crash prediction models. Methods: We collected the 2010-2012 Statewide Traffic Analysis Zone (STAZ) level crash data and developed rigorous machine learning approach (i.e., decision tree regression (DTR) models) for both pedestrian and bicycle crash counts. To our knowledge, this is the first application of DTR models in the burgeoning macro-level traffic safety literature. Results: The DTR models uncovered the most significant predictor variables for both response variables (pedestrian and bicycle crash counts) in terms of three broad categories: traffic, roadway, and socio-demographic characteristics. Additionally, spatial predictor variables of neighboring STAZs were considered along with the targeted STAZ in both DTR models. The DTR model considering spatial predictor variables (spatial DTR model) were compared without considering spatial predictor variables (aspatial DTR model) and the model comparison results discovered that the prediction accuracy of the spatial DTR model performed better than the aspatial DTR model. Finally, the current research effort contributed to the safety literature by applying some ensemble techniques (i.e. bagging, random forest, and gradient boosting) in order to improve the prediction accuracy of the DTR models (weak learner) for macro-level crash count. The study revealed that all the ensemble techniques performed slightly better than the DTR model and the gradient boosting technique outperformed other competing ensemble techniques in macro-level crash prediction models.
Article
The present work used the analytical hierarchy process supported by a machine learning technique (namely random forest variable importance) in a geospatial environment to identify traffic-incident hot spots where seven incident attributes (incident severity level, vehicle type, number of vehicles involved, number of weather condition factors, number of lanes blocked, time period, and incident duration) and incident frequency were considered. Using a frame of raster representation, each factor was ranked and weighted based on the literature, expert surveys, and random forest technique to perform hot spot analysis and generate a critical road index map. Statistical analysis of spatial clustering and hot spot spatial densities was carried out based on Moran’s I method of spatial autocorrelation, Getis–Ord Gi* statistics, and point kernel density. Based on weighted attributes of more than 130,000 incidents from Harris County, Texas, for the period between 2004 and 2013, the method successfully identified critical road segments and highlighted factors that contribute to criticality. The results show that high-severity segments are located in the downtown area during the period of 2004 to 2010, and shifted northwest of the downtown area during the period of 2010 to 2013. The random forest variable importance revealed that the number of lanes blocked along with time periods was the most critical factor affecting the severity of traffic incidents. Though the application of the method was demonstrated using incident database from Harris County, it can be generalized to other cities. Traffic agencies can use the suggested approach to garner and visualize reliable information that ensures better traffic management schemes.
Article
With the increase in smart devices, spatiotemporal data has grown exponentially. To deal with challenges caused by an increase data requires a scalable and efficient architecture that can store, query, analyze, and visualize spatiotemporal big data. This paper describes a Cloud-terminal integrated GIS platform architecture designed to meet the requirements of processing and analyzing spatiotemporal big data. Cloud-terminal Integration GIS is developed according to the architecture. Extensive experiments deployed on the internal organization cluster using real-time datasets showed that the SuperMap GIS spatiotemporal big data engine achieved excellent performance.