- A preview of this full-text is provided by Springer Nature.
- Learn more
Preview content only
Content available from Journal of Big Data Analytics in Transportation
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
Journal of Big Data Analytics in Transportation (2019) 1:83–94
https://doi.org/10.1007/s42421-019-00006-8
ORIGINAL PAPER
A Cyberinfrastructure forBig Data Transportation Engineering
MdJohirulIslam1 · AnujSharma1· HrideshRajan1
Received: 27 July 2018 / Revised: 4 April 2019 / Accepted: 9 April 2019 / Published online: 9 May 2019
© Springer Nature Singapore Pte Ltd. 2019
Abstract
Big data-driven transportation engineering has the potential to improve utilization of road infrastructure, decrease traf-
fic fatalities, improve fuel consumption, and decrease construction worker injuries, among others. Despite these benefits,
research on big data-driven transportation engineering is difficult today due to the computational expertise required to get
started. This work proposes BoaT, a transportation-specific programming language, and its big data infrastructure that is
aimed at decreasing this barrier to entry. Our evaluation, that uses over two dozen research questions from six categories,
shows that research is easier to realize as a BoaT computer program, an order of magnitude faster when this program is run,
and exhibits 12–14× decrease in storage requirements.
Keywords Big data· Domain-specific language· Cyberinfrastructure
Introduction
The potential and challenges of leveraging big data in trans-
portation has long been recognized (Adu-Gyamfi etal. 2017;
Barai 2003; Chakraborty etal. 2017; Chen and Zhang 2014;
Fan etal. 2014; Huang etal. 2016; Jagadish etal. 2014;
Kitchin 2014; Laney 2001; Liu etal. 2016; Lv etal. 2015;
Seedah etal. 2015; Wang etal. 2017; Zhang etal. 2011). For
example, researchers have shown that big data-driven trans-
portation engineering can help reduce congestions, fatalities,
and make building transportation applications easier (Barai
2003; Huang etal. 2016; Zhang etal. 2011). The availability
of open transportation data that are accessible, e.g. on the
web under a permissive license, has the potential to fur-
ther accelerate the impact of big data-driven transportation
engineering.
Despite this incredible potential, harnessing big data in
transportation for research remains difficult. To utilize big
data, expertise is needed along each of the five steps of a
typical data pipeline namely data acquisition; information
extraction and cleaning; data integration, aggregation, and
representation; modeling and analysis; and interpretation
(Jagadish etal. 2014). First three steps are further compli-
cated by the heterogeneity of data from multiple sources
(Seedah etal. 2015), e.g. speed sensors, weather station, and
national highway authority. A scientist must understand the
peculiarities of the data sources to develop a data acquisition
mechanism, clean data coming from multiple sources, and
integrate data from multiple sources. Modeling and analysis
are complicated by the volume of the data. For example, a
dataset of speed measurements from a commercial provider
for Iowa for a single day can be in multiple GBs, exceeding
the limits of a single machine. Analyses that aim to compute
trends over multiple years require storing, and computing
over, tens of TBs of just speed sensor data.
A possible solution could be to use the big data technolo-
gies like Hadoop and Apache Spark running over a distrib-
uted cluster. Using a distributed cluster with an adequate
number of nodes, problems related to the storage and time of
computation can be addressed. But these big data technolo-
gies are not so easy to use. Getting started requires techni-
cal expertise to set up the infrastructure, efficient design of
data schema, data acquisition strategy from multiple sources,
high level of programming skills, adequate knowledge of
distributed computing models, and a lot more efficiency in
writing distributed computer programs which is significantly
different than writing a sequential computer program in Mat-
lab, C, or Java. The analysis of big data in transportation
* Md Johirul Islam
mislam@iastate.edu
Anuj Sharma
anujs@iastate.edu
Hridesh Rajan
hridesh@iastate.edu
1 Iowa State University, Ames, IA, USA
Content courtesy of Springer Nature, terms of use apply. Rights reserved.