PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

High-accuracy long-coverage vehicle trajectory data can benefit the investigations of various traffic phenomena. Due to the shortage of trajectory connection methods, most of the existing vehicle trajectory datasets include plenty of broken vehicle trajectories, which may restrict the analyses of traffic dynamics and the validations of corresponding findings. To fill this gap, this paper proposed a car-following-based (CF-based) vehicle trajectory connection method that can connect broken vehicle trajectories. The proposed method can not only fill missing data points caused by detection errors but also can help connect vehicle trajectory datasets from different sensors. To illustrate the performance of the proposed CF-based method, the proposed method was employed to process a series of vehicle trajectory datasets. The datasets were extracted from aerial videos recorded at several successive spaces of Interstate-75, United States. Compared with several benchmark trajectory connection methods, the results showed that the proposed method has advantages in both trajectory connection accuracy and trajectory consistency aspects. To the best of the authors’ knowledge, the dataset processed by the proposed CF-based method, named by HIGH-SIM, is the longest vehicle trajectory dataset in the literature. The dataset has been published online for public use.
Content may be subject to copyright.
A Car-Following-Based Method for Vehicle Trajectory Connection 1
Xiaowei Shi, Dongfang Zhao, Xiaopeng Li* 3
Department of Civil and Environmental Engineering, University of South Florida, Tampa, FL, 33620, USA 5
Abstract 7
High-accuracy long-coverage vehicle trajectory data can benefit the investigations of various traffic 8
phenomena. However, most of the existing vehicle trajectory datasets may miss parts of the trajectories due 9
to sensing limitation and thus contain substantial broken vehicle trajectories. This may restrict the analyses 10
of traffic dynamics and the validations of corresponding findings. To address this issue, this paper proposed 11
a car-following-based (CF-based) vehicle trajectory connection method that can connect broken vehicle 12
trajectories. The proposed method can not only fill missing data points caused by detection errors but also 13
can help connect vehicle trajectory datasets from different sensors. To illustrate the performance of the 14
proposed CF-based method, the proposed method was employed to process a series of vehicle trajectory 15
datasets. The datasets were extracted from aerial videos recorded at several successive spaces of Interstate-16
75, United States. Compared with several benchmark trajectory connection methods, the results showed 17
that the proposed method has advantages in both trajectory connection accuracy and trajectory consistency 18
aspects. To the best of the authors’ knowledge, the dataset processed by the proposed CF-based method, 19
named by HIGH-SIM, is the longest vehicle trajectory dataset in the literature. The dataset has been 20
published online for public use. 21
Keywords: Car-Following Model, Vehicle Kinematics, Vehicle Trajectory Connection, Vehicle Trajectory 23
Dataset 24
1. Introduction 25
Vehicle trajectories, as the positions of a stream of vehicles over time along a guideway (Daganzo, 26
1997), can provide informative insights into various traffic-related studies, such as traffic flow theory, 27
traffic simulation modeling, traffic safety measures, and traffic management. Kim and Cao (2010) classified 28
existing vehicle trajectory data collection methods into two categories, including vehicle-based methods 29
(Anuar and Cetin, 2017; Coifman et al., 2016; Victor, 2014; Zhao et al., 2017) and video-based methods 30
(Babinec et al., 2014; Kim et al., 2019; Xu et al., 2017; Zhao and Li, 2019). The vehicle-based methods 31
suggest collecting vehicle trajectory data by probe vehicles. The probe vehicles equipped with 32
position/distance measurement sensors (e.g., Lidar, Radar, GPS) travel along the testing road segment and 33
thus the trajectories of the probe vehicles, as well as the surrounding vehicles, can be obtained. The 34
drawback of the vehicle-based methods is apparent. Since only the trajectories of the probe vehicle and its 35
surrounding vehicles will be collected, the data penetration rate with respect to the entire traffic is often 36
very low. On the other hand, the video-based methods extract vehicle trajectories from traffic videos, which 37
are recorded by roadside or aerial cameras over the investigated road segments. By tracking each vehicle’s 38
motions in the videos, the trajectories of all vehicles traveling along the road segments can be obtained. 39
Due to the rapid developments of aerial video recording technologies, e.g., unmanned aerial 40
vehicles with high-definition cameras, video-based trajectory data are becoming more appealing with 41
advantages such as scalability, flexibility, economy, and unbiasedness (Kim and Cao, 2010). Thus, in 42
recent years, the collection of video-based trajectory data has attracted wide attention of researchers both 43
from industry and academia (NGSIM, 2006; Apeltauer et al., 2015; Azevedo et al., 2014; Barmpounakis 44
and Geroliminis, 2020; Chen et al., 2020; Krajewski et al., 2018; Punzo et al., 2011; Shi et al., 2021). 45
Despite these merits of the video-based data collection method with the advanced technologies, we would 46
like to point out two types of fundamental issues in the datasets, including detection errors and limited
ranges as specified below. 2
Detection errors can be further divided into two categories in terms of the origin of the errors, such 3
as the source errors and extraction errors. The source errors are caused due to the partially losing video feed. 4
Since the videos are recorded in the air, it is possible that the target sites are partially blocked by facilities 5
(e.g., bridges, signals, billboards, buildings) around the road, and thus the vehicle motions in the blocked 6
areas are lost. As illustrated in Figure 1, a bridge across the recorded road segment leads to the loss of the 7
trajectories of all vehicles underneath. 8
Figure 1. Source errors illustration. 10
The extraction errors are caused by trajectory extraction methods. To obtain trajectory data from 11
video sources, various trajectory extraction methods have been proposed in the literature to track each 12
vehicle’s motions in videos. Reliable vehicle tracking is an extremely challenging problem, which falls into 13
the computer vision field and has attracted intensive studies in the past few years (Jazayeri et al., 2011; 14
Wang et al., 2008; Zhang et al., 2007). Although plenty of high-performance methods have been proposed 15
to tackle this problem, as far as we know, none of them can always guarantee 100% detection rates. 16
Particularly, some exogenous factors, such as the weather, light, wind, camera angle, etc., may impact the 17
quality of the recorded video as well as the detection rates. Thus, missing detections in the trajectory 18
extraction process seem inevitable. Once a missing detection happens, the original long trajectory will be 19
broken into shorter trajectories by the missing points, which degrades the quality of the obtained dataset. 20
As illustrated in Figure 2, due to the angles of the cameras, the vehicle sizes in the video are gradually 21
reduced as approaching the end of the road segment. This leads to the decrease of the detection rates and 22
thus the extracted trajectories are broken into small pieces in the downstream. 23
Figure 2. Extraction errors illustration. 2
To overcome the detection errors issue, data post-processing studies on trajectory smoothing were 3
conducted (Lee and Krumm, 2011; Punzo, 2009; Siddique, 2019; Wu, 2018; Xin, 2008). Interested readers 4
can refer to Lee and Krumm (2011) for a detailed review of vehicle trajectory post-processing. The studies 5
investigating the vehicle trajectory connection are scarce in the literature. Kim et al. (2019) connected 6
broken trajectories by extending the trajectory of each vehicle in three seconds with a constant speed. Tong 7
et al. (2017) connected broken trajectories by extending the linear interpolation method considering the 8
historical data and contextual arrival information. Zhang and Jin (2019) performed vehicle trajectory data 9
cleaning and connection (or stitching) while extracting the data, but they did not specify the adopted 10
trajectory connection method. One can see that although this topic started to draw attention in recent years 11
as the emergence of aerial video recording technologies, the existing methods in the literature still are fairly 12
simple, which may only yield results with limited quality without capturing driving behavior and physics 13
(i.e., car-following behavior). To the best of the author’s knowledge, Sazara et al. (2017) is the only study 14
that connects broken vehicle trajectories considering car-following characteristics. They collected raw 15
vehicle trajectory data by the vehicle-based method using Lidar sensors. To connect the broken trajectories, 16
they first extended the broken trajectories with Gipp’s car-following model, and then a simple reshaped 17
operation was proposed to connect the two pieces of the broken trajectories. Despite the success of this 18
study, we would like to point out that there is a strong underlying assumption. That is, the vehicle IDs of 19
the broken trajectories are given and thus they can be easily matched by each individual vehicle. This 20
assumption can be satisfied easily if the datasets are collected by the vehicle-based method (e.g., Lidar, 21
Radar, etc.). However, how to efficiently match two broken trajectories without explicit vehicle ID tags 22
when the size of the extracted dataset is large (collected by the video-based methods) is an intriguing 23
problem. This problem even becomes harder if the recorded videos include multiple lanes, in which broken 24
vehicle trajectories across different lanes may belong to the same vehicle due to possible lane change 25
maneuvers. Figure 3 illustrates the described lane change maneuvers on a simple two-lane road. Figure 3 26
(a) and (b) are the trajectories from lane 1 and lane 2, respectively. If a vehicle makes a lane change from 27
lane 1 to lane 2, as highlighted in both Figure 3 (a) and (b), the trajectory of this vehicle for each lane is 28
incomplete”. A wrong connection may happen if a broken trajectory is coincidently near to the 29
incomplete” trajectory, as illustrated in the Figure 3 (b). In addition, without further investigations of 30
feasible ranges for vehicle trajectories in the space-time diagram, the reshaped operation may easily 31
generate trajectories that violate vehicle kinematic constraints. 32
Detec tion er ror
Lane c hange
Lane c hange
Lane 1 Lane 2
(a) (b) 2
Figure 3. Lane change scenario illustration. 3
The second fundamental issue that may exist in the data is limited ranges. Compared to the vehicle-4
based collection methods, the video-based collection methods have advantages on the data collection scale 5
(e.g., the number of collected trajectories). However, the detection ranges (e.g., the length of vehicle 6
trajectories) are greatly constrained by both the detection accuracy and the altitude of the cameras. That is, 7
the cameras with a higher altitude can record the traffic video of a greater range, yet the requirement on the 8
detection accuracy of the cameras will be higher. Table 1 lists the length of several video-based trajectory 9
datasets reported in the literature. We see that the longest trajectory dataset is the NGSIM dataset, the length 10
of which is 640 meters. However, it is still insufficient to observe a full life cycle of traffic phenomena (e.g., 11
traffic bottleneck development and dissipation). To break the constraints on the detection ranges, a direct 12
solution is to either improve the performance of cameras (detection accuracy) or fly the cameras to a higher 13
altitude. However, these solutions belong to the optics or mechanical field and are out of the knowledge 14
scope of a transportation engineer. To resolve this issue from the transportation engineering perspective, 15
Raju et al. (2021) proposed a concept of stitching trajectory data from different cameras. However, methods 16
of stitching videos from different cameras reliably have not been proposed following this seminal concept. 17
Table 1. Trajectory length comparisons among video-based trajectory datasets. 18
Publication Location
Highways, United
640 meters Yes
Krajewski et al., (2018)
Highways, Geman
420 meters
Azevedo et al. (2014) Motorway, Portugal 500 meters No /
Kim et al. (2019)
Expressways, Korea
188 meters
Babinec et al. (2014)
Ring road, Czech
300 meters No /
Xu et al. (2017)
Freeway and urban
roads, China
160 meters No /
Barmpounakis and
Geroliminis (2020)
Urban roads, City of
Athens, Greece
350 meters Yes
Our dataset: HIGH-SIM
Highways, United
Overall, one can see that the existing video-based datasets are constrained by the aforementioned
two issues, which may restrict the analyses of traffic dynamics and the validations of corresponding findings. 2
To help circumvent these issues, this paper proposed a car following-based (CF-based) method for vehicle 3
trajectory connection, in which the broken vehicle trajectories were connected based on the car-following 4
theory. The proposed method can not only fill missing data points caused by detection errors (solving the 5
detection errors issue) but also can help connect trajectory data from different sensors (solving the limited 6
ranges issue). To illustrate the performance of the proposed CF-based method, we processed a series of 7
vehicle trajectory datasets that were extracted from aerial videos recorded at several successive segments 8
of Interstate-75 (28°08'37.2"N 82°22'58.8"W to 28°10'16.2"N 82°23'38.0"W), United States (See Shi et al. 9
(2021) for the detailed trajectory extraction method). The results showed that the proposed method 10
outperformed several benchmark methods in both trajectory connection accuracy and trajectory consistency 11
aspects. Moreover, the dataset processed by the proposed CF-based method, named by HIGH-SIM, has 12
been published in the data shared link of both Federal Highway Administration, U.S. Department of 13
Transportation ( and Connected and Autonomous Transportation Systems Lab, 14
University of South Florida ( for public use. Based on the best of the 15
authors’ knowledge, the HIGH-SIM dataset is the longest high-resolution vehicle trajectory dataset 16
capturing all vehicles in the traffic stream among the publicly available ones, which includes a full life 17
cycle of the bottleneck. This paper focuses on the trajectory connection method we adopted to generate the 18
dataset. Interested readers can refer to Shi et al. (2021) for a detailed introduction about the HIGH-SIM 19
dataset. 20
The disposition of this paper is as follows. Section 2 describes the trajectory connection problem 21
investigated in this paper. Section 3 exhibits the proposed CF-based vehicle trajectory connection method. 22
A series of numerical experiments are conducted in Section 4 to demonstrate the performance of the 23
proposed method. Section 5 concludes the paper and discusses the limitations of the proposed method and 24
possible solutions. 25
2. Problem Statement 26
The investigated trajectory connection problem is stated as follows. Let denote the set of 27
vehicle trajectories in the investigated dataset, and each trajectory is labeled as {1,2, ,}. The 28
trajectory data are captured in a spatial range [0, ] within a continuous time period [0, ]. In practice, the 29
data are usually only available at discrete time points, and thus the time period is discretized into time points 30 : = {0, 1,2, ,} with time interval : = /. Each trajectory is defined as the composition of a 31
pair of arrays, (,) with time point array = [,,,] denoting the consecutive time points and 32
location array = [,,,] denoting the corresponding location coordinates (e.g., mileposts) of 33
the trajectory at these time points, where is the total number of the data points in trajectory . In addition, 34 : = [,,,], = [,,,], and = [,,,] denote the velocities, 35
accelerations, and preceding trajectory’s labels of trajectory at the corresponding time points, respectively. 36
If there is no preceding trajectory preceding to trajectory at time , then  is set to 0. The length of the 37
vehicle associated with trajectory is denoted by . 38
It is expected that a great number of trajectories in dataset only capture a portion of the subject 39
vehicles’ motions due to the aforementioned issues. These trajectories, referred as broken trajectories, can 40
be identified if they satisfy either of the following conditions: (1) if > 0 and > 0, ; or (2) if 41 < and <,. We collect all broken trajectories from and denote the broken trajectory 42
dataset as . To understand the missing segment of each broken trajectory, according to the 43
conditions, we further classified the broken trajectory dataset into three subsets, such as ,,
. 44
For those broken trajectories that satisfy condition (1), we store them into , which means that these 1
trajectories are broken at the origin side (i.e., a segment before (,) is missed). For those broken 2
trajectories that satisfy condition (2), we store them into , which means that these trajectories are broken 3
at the end side (i.e., a segment after (,) is missed). For those broken trajectories that satisfy both 4
conditions (1) and (2), we store them into 
, which means that these trajectories are broken at both sides. 5
Thus, we have {}=, and = {
}. For the specific example shown in Figure 4, 6 = {1,2, ,8}, = {2,3,5,6,7}, = {3,6,7}, = {2,5,6}, 
= {6}. 7
Figure 4. Problem statement. 9
This paper aims to connect these broken trajectories considering vehicle driving characteristics (i.e., 10
car-following behavior) and thus enhances the quality of the dataset. To this end, this paper proposes a CF-11
based method for vehicle trajectory connection, in which the broken vehicle trajectories can be connected 12
based on car-following theory. 13
3. Methodology 14
The proposed CF-based vehicle trajectory connection method includes two steps: (1) car-following 15
model calibration; and (2) vehicle trajectory connection. 16
3.1 Car-following model calibration 17
To connect the broken vehicle trajectories considering the car-following behavior, we first calibrate 18
a car-following model with the dataset. The car-following model adopted in this paper is the Pitt car-19
following model (Drew, 1968), which is shown in Equation (1). 20
=+ 3.04878 ++(),
where is the spacing headway between the leading vehicle and following vehicle, is the length of the 21
leading vehicle, is a sensitivity factor, is the speed of the following vehicle, is the speed of the 22
leading vehicle, and is a calibration constant ( >,= 0.1; otherwise, = 0). Note that this 23
paper focuses on connecting the broken trajectories considering the car-following behavior. We select the 24
Pitt car-following model without further comparing it to other car-following models. The comparison
among different car-following models may be out of the scope of this paper, and interested readers can refer 2
to (van Hinsbergen et al., 2015, Rahman, 2013) for detailed comparisons. 3
With the Pitt car-following model, a series of car-following trajectory pairs, including the time (), 4
location (), and speed () of both leading and following vehicles, is extracted from the dataset to calibrate 5
the model. By referring to the preceding trajectory label array (i.e., ), the car-following trajectory pairs 6
can be obtained easily, i.e.,  0, extracting ,, and ,,. With the car-7
following trajectory pairs, we calibrate the Pitt car-following model with a greedy algorithm. The error 8
measurement function is used to evaluate the fitness of the Pitt car-following model, as shown in Equation 9
(2): 10
=  ()
 /, (2)
where is the standard error of the estimated location () and the real location (), is the total number 11
of the car-following trajectory pairs, is the number of the data captured for trajectory pair . 12
The detailed calibration procedures are shown in Figure 5. The calibration starts by initializing the 13
model parameters and with random values [0,0.1], [0,2], and inputting the convergence 14
criteria and step size ,. In each iteration , the variable , will be updated as =15 (0,1)+, =(0,1)+, where (0,1) is a random number in the 16
range of (0,1). In the next step, the error of the fitness is calculated according to Equation (2). These 17
procedures will be repeated until the variation of the estimation error ||, which means that
the optimal parameters , are obtained. 19
Figure 5. Pitt car-following model calibration process 21
3.2 Vehicle trajectory connection 22
With the calibrated Pitt car-following model, the proposed CF-based vehicle trajectory connection
method connects the broken trajectories with the following two sub-steps. Sub-step 1: Extending the broken 2
vehicle trajectories with the calibrated car-following model. Sub-step 2: Connecting the broken trajectories 3
considering vehicle kinematics constraints. 4
Sub-step 1: 5
For those broken trajectories in (or ), the trajectories are extended backward (or forward) 6 time intervals with the calibrated Pitt car-following model. We name the extended trajectories as the 7
transition trajectories in this paper. The time length of the transition trajectories is denoted by , which is 8
a given value dependent on the quality of dataset . That is, a large value is set if the time length of the 9
missing trajectories is long and vice versa. The effects of different values on the trajectory connection 10
will be analyzed in Section 4. Assume that there are two broken trajectories labeled by and 11 , and the preceding vehicle trajectory are labeled by and . The arrays of the transition trajectory 12
for trajectory (i.e.,
) are generated according to Equations (3)-(8), where 13
superscript indicates that the trajectory extends backward. 14
,{1,2, ,1}, (3)
=(),{1,2, ,}, (4)
=,{1,2, ,1}, (5)
=+ 3.04878 ++
, (6)
)/,{1,2, ,1}, (7)
: = 
=,{1,2, ,}. (8)
Similarly, the arrays of the transition trajectory for trajectory (i.e.,
) 15
can be calculated according to Equations (9)-(14), where superscript indicates that the trajectory extends 16
forward. 17
,= 1
,{2,3, ,}, (9)
=+,{1,2, ,}, (10)
=,{1,2, ,}, (11)
=+ 3.04878 ++
, (12)
)/,= 1
)/,{2,3, ,}, (13)
: = 
=,{1,2, ,}. (14)
We illustrate the obtained transition trajectories with trajectory 6 described previously, which
belongs to 
including both backward and forward extensions. As shown in Figure 6, by referring to the 2
preceding vehicle trajectory, the transition trajectories are generated regarding the Pitt car-following model 3
colored with green. Here we denote the set of all transition trajectories as . For the specific example that 4
we described in Section 2,
. 5
Figure 6. Transition trajectory illustration. 7
Sub-step 2: 8
In step 2, we propose the criterion for connecting two broken trajectories through the transition 9
trajectories. In addition, vehicle kinematics constraints are considered while connecting the trajectories. 10
The criterion for trajectory connection is defined as follows. Assume that there is an arbitrary 11
transition trajectory ( can be either or ). For each trajectory ,, the location 12
difference between the transition trajectory and the broken trajectory, which is denoted as , can be 13
calculated by 14
: =   
)  
,),=min 
If we have < and =min ,, we consider the two trajectories and are the 1
trajectories of the same vehicle and thus we can connect them. is a given error term for evaluating the 2
location difference between the transition trajectory and the broken trajectory. Note that a large value 3
may cause a wrong connection, while a small value may reject a correct connection due to the estimation 4
errors. In the practice implementation of the proposed algorithm, different values shall be given 5
regarding the quality of the raw datasets. We will further analyze the performance of different values in 6
Section 4. 7
Note that the transition trajectory may not perfectly connect to the origin/end point of other broken 8
trajectories due to the errors of the trajectory estimation (i.e., ) as illustrated in Figure 7. It can be seen that 9
there is a gap between the transition trajectory and the broken trajectory. To connect the trajectories 10
(trajectories and ) considering vehicle kinematics constraints, we propose a time-space-cone-based 11
trajectory connection method as follows. 12
Figure 7. Time-space cone illustration. 14
Assume that the kinematics constraints for each vehicle are given, including the maximum speed, 15
maximum and minimum accelerations, which are denoted as ,, in this paper. Then for each 16
pair of ready-to-connect trajectories ,, we can uniquely generate two boundary trajectories starting 17
from the end (or the origin) of the trajectories, named as the slowest and fastest trajectories as illustrated in 18
Figure 7. Each pair of the proposed boundary trajectories (a slowest trajectory and a fastest trajectory), 19
which starts at the same point (e.g., either (,) or (,)), covers all feasible trajectories passing 20
the point and forms a cone-shaped area in the time-space graph. Starting from the end (or the origin) of the 21
trajectory, the slowest trajectory is generated by operating the vehicle forward (or backward) with the 22
minimum acceleration (i.e., min{, 0}) until time (or time 0) in the time-space graph. The physical 1
meaning of the slowest trajectory is that the slowest trajectory is a lower bound to all feasible trajectories, 2
which suggests that all trajectories starting from the end (or the origin) of the trajectory operate faster than 3
the slowest trajectory. Similarly, the fastest trajectory is generated by operating the vehicle forward (or 4
backward) with the maximum acceleration (i.e., ) until the speed of the vehicle getting to the 5
maximum speed (i.e., ). Also, the fastest trajectory is generated until the trajectory reaching time 6
(or time 0) in the time-space graph. The physical meaning of the fastest trajectory is that the fastest 7
trajectory is an upper bound to all feasible trajectories, which suggests that all trajectories starting from the 8
end (or the origin) of the trajectory operate slower than the fastest trajectory. 9
The equations for generating the slowest and fastest trajectories for trajectory are shown in 10
Equations (16)-(23). To avoid repetition, we only show the equations for trajectory , and those of trajectory 11 can be obtained easily by considering that the vehicle operates reversely in the time-space graph. 12
=+0.5 ,= 1
 +
 0.5
 ,{2,3, , (
)/}, (16)
=+,{1,2, , ()/},
 
=max0, ,= 1
) , {2,3, , (
)/}, (18)
 
=, /( )
0, 
, (19)
+ 0.5
,= 1
 +
 + 0.5
 ,{2,3, , (
)/}, (20)
+,{1,2, , (
)/}, (21)
=min(,+ ) ,  ()/( )
, (22)
 
, (
0, 
. (23)
With these boundary trajectories, all feasible trajectories that not only satisfy the vehicle kinematics
constraints but also connect the two broken trajectories are restricted into the shadow area as illustrated in 14
Figure 7. By considering the transition trajectory that we obtained in Step 1, which we denote as the old 15
transition trajectory (i.e.,
), the new transition trajectory that can connect the two broken trajectories 16
(trajectories and ) is obtained by the trajectory that has the minimum location difference with the old 17
transition trajectory in the shadow area. We denote the new transition trajectory as
. The equation to 18
obtain the new transition trajectory is shown in Equations (24) and (25). The and sets of the new 19
transition trajectory can be obtained according to Equations (24) and (25) with a given location value. 20
) , 1,2, , ()/, (24)
+,1,2, , (
)/ (25)
Figure 8 illustrates the new transition trajectory. With this, the broken trajectories and are 1
connected. The new trajectory is formed by combing the arrays of these three trajectories, such as 2
trajectories , new transition trajectory, and trajectory . By repeating these two steps, the issues we 3
revealed previously, i.e., detection errors and limited ranges, can be successfully fixed. Nonetheless, we 4
would like to point out one obvious limitation of the proposed algorithm. That is, the new transition 5
trajectory is obtained without considering the acceleration variations, which suggests that the acceleration 6
of the transition trajectory may dramatically change at the intersection point of two trajectories, e.g., the 7
intersection point of the old transition trajectory and fastest trajectory as shown in Figure 8. This limitation 8
can be circumvented by some trajectory smoothing techniques (Lee and Krumm, 2011), e.g., the merging 9
operation proposed by (Li and Li, 2019) can connect two quadratic trajectories with smooth acceleration 10
variations. However, the investigation of this technique may be out of the scope of this paper, and interested 11
readers can refer to (Li and Li, 2019) for more details. 12
Figure 8. New transition trajectory illustration. 14
4. Numerical Experiment 15
4.1 Dataset 16
In the numerical experiment, we demonstrate the proposed CF-based vehicle trajectory connection 17
method with a set of raw vehicle trajectory datasets extracted from aerial videos (Shi et al., 2021). As shown 18
in Figure 9 (a), the aerial videos were collected by three 8K cameras on a helicopter from 4:15 6:15 pm 19
on Tuesday (May 14, 2019) at 8,000 feet (2,438 meters) long segment of Interstate-75 in Florida, United 20
States. The segment includes bi-directional traffic flow, and we only use vehicle trajectories operating from 21
south to north (down to up in Figure 9 (a)) in this paper. The extracted trajectory datasets contain vehicle 22
trajectories of three regular lanes for vehicle proceeding. From left to right in Figure 9 (a), the three lanes 23
are named Lane 2, Lane 1, and Lane 0, respectively. A sample of the detected vehicles in the trajectory 24
extraction process is shown in Figure 9 (b), in which the detected vehicles are marked with red boxes. The 25
frequency of the extracted datasets is 30 Hz, and the format of the datasets is consistent with the NGSIM 26
dataset for the convenience of further trajectory analysis and public use. 27
(a) (b) 2
Figure 9. Study area for collecting the aerial videos (Shi et al. (2021)). 3
4.2 Trajectory Connection Result 5
Before the trajectory connection, the raw datasets include 283,501 broken vehicle trajectories, 6
which were caused due to the issues we mentioned previously, e.g., missing detections, wrong detections, 7
etc. The speed and acceleration ranges of the trajectories in the raw datasets are [0, 150] ft/s ([0, 45.72] m/s 8
or [0, 165] km/h) and [-20, 20] ft/s2 ([-6.10, 6.10] m/s2). 9
By using the proposed CF-based trajectory connection method to process the extracted raw datasets, 10
283,501 broken vehicle trajectories are eventually connected to 2,184 vehicle trajectories. The processed 11
dataset is named as the HIGH-SIM dataset that has been published online for public use. We plot the vehicle 12
trajectories both before and after the connection in Figure 10 to help readers understand the performance 13
of the proposed method. It can be seen in Figure 10 (a)-(c) that the raw vehicle trajectories have some 14
common brokens, suggesting that the trajectories are extracted from the aerial videos shot by different 15
cameras. We can also observe that the raw vehicle trajectories are broken into a bunch of small pieces while 16
approaching the end of the segment. This is because of the camera angle issue as we described in Figure 2. 17
That is, as vehicles are away from the camera, the vehicle sizes on the video are gradually reduced and thus 18
the detection rates decrease. However, after processing the raw datasets with the proposed method, most of 19
the vehicle trajectories are successfully connected, as shown in Figure 10 (d)-(f). The results show that the 20
HIGH-SIM dataset has more reasonable speed and acceleration distributions than a well-known trajectory 21
dataset, the NGSIM US-101 dataset. We explicitly study the quality of the HIGH-SIM dataset in Shi et al. 22
(2021), and interested readers can refer to it for more details. 23
Raw datasets
HIGH-SIM dataset
(a) Lane 2
(d) Lane 2
(b) Lane 1
(e) Lane 1
(c) Lane 0
(f) Lane 0
Figure 10. Comparison between the raw datasets and the processed datasets. 1
4.3 Different Methods Comparison 2
We compare the performance of the proposed CF-based vehicle trajectory connection method with 3
several benchmark methods, including Kim et al. (2019)’s method (denoted as the linear-based method), 4
nonlinear-based method (extending the broken trajectory by a quadratic function), and Kalman filter. The 5
testing dataset is generated by broking a set of complete vehicle trajectories of the extracted raw dataset. 6
The raw dataset is used to prevent the external factors that potentially influence the results. For example, if 7
we use the HIGH-SIM dataset, the vehicle trajectory connection rate by the proposed method definitely
will be higher than other methods. The connection rate is denoted by : = /, where is the number 2
of connected vehicle trajectories and is the number of broken vehicle trajectories. We compare the 3
connection rate of different methods by varying the mean of the broken time length : = () and the 4
variance of the broken time length : = () of the trajectories, where is the time length of the 5
trajectory segment that we deleted from each trajectory. The varying range of the mean broken time length 6
is [0.1, 3] second and that of the variance of the broken time length is [0, 64]. 7
Figure 11 plots the connection rate for each method with varying the mean of the broken time 8
length. We can observe that the connection rate ranges of the studied methods are [0.9, 1], [0, 0.2], [0.2,0.5], 9
[0.5,0.9] for the proposed method, the linear-based method, the nonlinear-based method, and Kalman filter, 10
respectively. The proposed method outperforms all benchmark methods regarding the connection rate. One 11
reason for this superior performance is that the proposed method incorporates a car-following model into 12
the trajectory connection and thus well captures the driving characteristics, which provides benefits on the 13
trajectory connection. It also can be observed that as the mean of the broken time length increases, the 14
connection rate appears decreasing trend for each method. This indicates that the longer the time length of 15
the missing trajectory is, the harder the trajectory connection will be. However, the connection rate of the 16
proposed method decreases slower than those of the benchmark methods, which further supports the 17
superior performance of the proposed method. 18
Figure 11. Connection rate comparison with varying mean broken time length. 20
Further, we change the variance of broken time length (i.e., ) to study the robustness of the 21
proposed method. The variance of the broken time length is varied from 0 to 64, and the connection rates 22
of the methods are shown in Figure 12. We can observe that as the variance of the broken time length 23
increases, the connection rate also appears decreasing trend for each method. This is because that a higher 24
variance of the broken time length gives a higher probability of long broken trajectories and thus leads to a 25
lower trajectory connection rate. Note that although the connection rate of the proposed method is degraded 26
with a high variance of the broken time length, the overall connection rate of the proposed method still is 27
the highest among the four methods, which varies within [0.6, 1]. This result indicates that the proposed 28
method has relatively good robustness when dealing with different quality datasets and thus strengthens the 29
transferability of the proposed method. 30
Figure 12. Connection rate comparison with varying the variance of broken time length. 2
Moreover, we are also interested in the effects of the two critical parameters on the connection rate 3
of the proposed method, such as the time length of the transition trajectory (i.e., ) and the error term for 4
evaluating the location difference between the transition trajectory and the broken trajectory (i.e., ). Note 5
that the default values of and are set to 1.67 seconds and 5, respectively. When we vary the values of 6
one parameter, we keep the value of the other parameter the same as the default value. Then, as shown in 7
Figure 13 (a), by increasing the values of from 0.2 to 3.2 seconds, the connection rates increase from 0.4 8
to 0.9. There is a significant increase in the connection rate when equals to around 1.5 s, which indicates 9
that most of the broken time length of the trajectories (around 80%) is less than 1.5 s. As can be seen in 10
Figure 13 (b), as the values of increase, the connection rates increase fast at the beginning (from 2 to 6). 11
However, when the values of are greater than 6, the keep increasing of the values of does not much 12
impact the connection rates. This is because that for a correct trajectory connection, the calculated distance 13
error between the two trajectories is dependent on the accuracy of the adopted car-following model. The 14
maximum distance error for a given car-following model is a bounded value and is regardless of the dataset. 15
Once equals to the maximum distance error for a correct trajectory connection, the keep increasing of 16
the values of will increase the probability of a wrong connection and will not impact the connection rate 17
that reflects the percentage of correct connections. 18
(a) (b) 20
Figure 13. Sensitivity analysis for two critical parameters ( and ). 21
5. Conclusion 22
This paper proposed a car-following-based (CF-based) vehicle trajectory connection method that 23
can connect broken vehicle trajectories based on car-following theory. The proposed method can not only 24
fill missing data points caused by detection errors but also can help connect trajectory data from different 25
sensors. To illustrate the performance of the proposed CF-based method, the proposed method was
employed to process a series of vehicle trajectory datasets that were extracted from aerial videos recorded 2
at several successive spaces of Interstate-75, United States. Comparing with several benchmark methods, 3
the results showed that the proposed method has advantages in both trajectory connection accuracy and 4
trajectory consistency aspects. The dataset processed by the proposed method, named by the HIGH-SIM 5
dataset, was the longest vehicle trajectory dataset in the literature based on the authors’ knowledge, which 6
contains a full life cycle of the bottleneck. The dataset has been published online for public use. 7
Future research can be conducted in a few directions. It will be interesting to adopt some learning-8
based methods (e.g., machine learning, deep learning, etc.) to extend the broken trajectory for trajectory 9
connection. Moreover, it will be interesting to validate the existing macro- and microscopic traffic 10
characteristics with the extracted datasets. 11
Acknowledgment 13
This research is supported by the US National Science Foundation through Grants Crisp 1 and 14
Crisp 2. 15
References 17
Anuar, K., Cetin, M., 2017. Estimating Freeway Traffic Volume Using Shockwaves and Probe Vehicle 18
Trajectory Data. Transp. Res. Procedia 22, 183192. 19
Apeltauer, J., Babinec, A., Herman, D., Apeltauer, T., 2015. Automatic vehicle trajectory extraction for 20
traffic analysis from aerial video data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS 21
Arch. 40, 9–15. 22
Azevedo, C.L., Cardoso, J.L., Ben-Akiva, M., Costeira, J.P., Marques, M., 2014. Automatic Vehicle 23
Trajectory Extraction by Aerial Remote Sensing. Procedia - Soc. Behav. Sci. 111, 849858. 24 25
Babinec, A., Herman, D., Cecha, S., 2014. AUTOMATIC VEHICLE TRAJECTORY EXTRACTION 26
Barmpounakis, E., Geroliminis, N., 2020. On the new era of urban traffic monitoring with massive drone 28
data : The pNEUMA large-scale field experiment. Transp. Res. Part C 111, 5071. 29 30
Chen, X., Li, Z., Yang, Y., Qi, L., Ke, R., 2020. High-Resolution Vehicle Trajectory Extraction and 31
Denoising From Aerial Videos. IEEE Trans. Intell. Transp. Syst. 113. 32 33
Coifman, B., Wu, M., Redmill, K., Thornton, D.A., 2016. Collecting ambient vehicle trajectories from an 34
instrumented probe vehicle: High quality data for microscopic traffic flow studies. Transp. Res. Part 35
C Emerg. Technol. 72, 254–271. 36
Daganzo, C.F., 1997. Fundamentals of Transportation and Traffic Operations. Fundam. Transp. Traffic 37
Oper. 38
Drew, D.R., 1968. Traffic Flow Theory and Simulation. New York, McGraw-Hill. 39
Jazayeri, A., Cai, H., Zheng, J.Y., Tuceryan, M., 2011. Vehicle detection and tracking in car video based
on motion model, in: IEEE Transactions on Intelligent Transportation Systems. pp. 583–595. 2 3
Kim, E.J., Park, H.C., Ham, S.W., Kho, S.Y., Kim, D.K., Hassan, Y., 2019. Extracting Vehicle 4
Trajectories Using Unmanned Aerial Vehicles in Congested Traffic Conditions. J. Adv. Transp. 5
2019. 6
Kim, Z.W., Cao, M., 2010. Evaluation of feature-based vehicle trajectory extraction algorithms. IEEE 7
Conf. Intell. Transp. Syst. Proceedings, ITSC 99104. 8
Krajewski, R., Bock, J., Kloeker, L., Eckstein, L., 2018. The highD Dataset: A Drone Dataset of 9
Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving 10
Systems, in: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). 11
IEEE, pp. 21182125. 12
Li, L., Li, X., 2019. Parsimonious trajectory design of connected automated traffic. Transp. Res. Part B 13
Methodol. 119, 121. 14
Punzo, V., 2009. Estimation of vehicle trajectories from observed discrete positions and Next‐Generation 15
Simulation Program (NGSIM) data. 16
Punzo, V., Borzacchiello, M.T., Ciuffo, B., 2011. On the assessment of vehicle trajectory data accuracy 17
and application to the Next Generation SIMulation (NGSIM) program data. Transp. Res. Part C 18
Emerg. Technol. 19, 12431262. 19
Raju, N., Arkatkar, S., Easa, S., Joshi, G., 2021. Developing extended trajectory database for 20
heterogeneous traffic like NGSIM database. Transp. Lett. 00, 110. 21 22
Sazara, C., Nezafat, R.V., Cetin, M., 2017. Offline reconstruction of missing vehicle trajectory data from 23
3D LIDAR. IEEE Intell. Veh. Symp. Proc. 792797. 24
Tong, C., Chen, H., Xuan, Q., Yang, X., 2017. A framework for bus trajectory extraction and missing 25
data recovery for data sampled from the internet. Sensors (Switzerland) 17. 26 27
van Hinsbergen, C.P.I.J., Schakel, W.J., Knoop, V.L., van Lint, J.W.C., Hoogendoorn, S.P., 2015. A 28
general framework for calibrating and comparing car-following models. Transp. A Transp. Sci. 29 30
Victor, T., 2014. Analysis of Naturalistic Driving Study Data: Safer Glances, Driver Inattention, and 31
Crash Risk, Analysis of Naturalistic Driving Study Data: Safer Glances, Driver Inattention, and 32
Crash Risk. Transportation Research Board, Washington, D.C. 33
Wang, G., Xiao, D., Gu, J., 2008. Review on vehicle detection based on video for traffic surveillance, in: 34
2008 IEEE International Conference on Automation and Logistics. IEEE, pp. 29612966. 35 36
Xu, Y., Yu, G., Wu, X., Wang, Y., Ma, Y., 2017. An Enhanced Viola-Jones Vehicle Detection Method 37
from Unmanned Aerial Vehicles Imagery. IEEE Trans. Intell. Transp. Syst. 18, 18451856. 38 39
Zhang, G., Avery, R.P., Wang, Y., 2007. Video-based vehicle detection and classification system for real-40
time traffic data collection using uncalibrated video cameras. Transp. Res. Rec. 138147. 41 42
Zhang, T., Jin, P.J., 2019. A longitudinal scanline based vehicle trajectory reconstruction method for
high-angle traffic video. Transp. Res. Part C Emerg. Technol. 103, 104128. 2 3
Zhao, D., Li, X., 2019. Real-World Trajectory Extraction from Aerial Videos - A Comprehensive and 4
Effective Solution. 2019 IEEE Intell. Transp. Syst. Conf. ITSC 2019 28542859. 5 6
Zhao, H., Wang, C., Lin, Y., Guillemard, F., Geronimi, S., Aioun, F., 2017. On-Road Vehicle Trajectory 7
Collection and Scene-Based Lane Change Analysis: Part I. IEEE Trans. Intell. Transp. Syst. 18, 8
192205. 9
Next Generation Simulation, 2006. Source: 10
Shi, X., Zhao, D., Yao, H., Li, X., James, R., Hale, D., and Ghiasi, A., 2021. An Open Database 11
Generation with Monte Carlo Based Lane Marker Detection and Critical Analysis of Vehicle 12
Trajectory - High-Granularity Highway Simulation (HIGH-SIM). Preprint: 13 14
Wu, M., 2018. Collecting Ambient Vehicle Trajectories from an Instrumented Probe Vehicle and Fusing 15
with Loop Detector Actuations. Dissertation. The Ohio State University. 16
Lee WC., Krumm J., 2011. Trajectory Preprocessing. In: Zheng Y., Zhou X. (eds) Computing with 17
Spatial Trajectories. Springer, New York, NY. 18
Rahman, M., 2013. Application of Parameter Estimation and Calibration Method for Car-Following 19
Models. Clemson University. 20
Full-text available
We propose a trajectory extraction method to extract long vehicle trajectories from aerial videos. The proposed method includes video calibration, vehicle detection and tracking, lane identification, and vehicle position calibration. The proposed method is applied to several high-resolution aerial videos in the numerical example, including the new collected High-granularity highway simulation (HIGHSIM) vehicle trajectory. Besides, we applied several trajectory data analysis methods to analyze the accuracy and consistency of a trajectory dataset. The quality of the extracted HIGH-SIM dataset is compared with NGSIM 101 data. The result shows that the HIGH-SIM has more reasonable speed and acceleration distribution than the NGSIM 101 dataset. Besides, the internal consistency and platoon consistency of the extracted HIGH-SIM dataset gives lower errors comparing to the NGSIM 101 dataset. The HIGH-SIM dataset is published in the data shared link of Federal Highway Administration, US Department of Transportation for public use.
Full-text available
In this paper, a robust and efficient High-angle Spatial-Temporal Diagram Analysis (HASDA) model is built to reconstruct high-resolution vehicle trajectories from infrastructure traffic surveillance videos. A combined methodology was developed, comprising of scanline-based trajectory extraction and feature-matching coordinate transformation. A scanline-based trajectory extraction technique is introduced to separate vehicle strands from pavement background on the spatial-temporal diagram by considering color features, gradient features, and motion features. Particular cleaning algorithms for removing static object noises, shadows, and occlusions are also established. Feature-matching coordinate transformation converts the pixel coordinates to the real-world coordinates to generate the physical vehicle trajectory. To evaluate the algorithm, generated trajectory results were compared to the reconstructed version of the Next Generation Simulation (NGSIM) dataset. 15-min NGSIM video was divided into a 5-min dataset for the calibration and the remaining 10-min data for evaluation. Model parameters calibrated based on the 5-min video data are then applied to the 10-min testing data. Two levels of performance measurements are considered to evaluate both trajectory-level and point-level results. A reference algorithm based on mainstream motion-based detection and tracking methods are used as a baseline algorithm. Based on the evaluation results, the proposed method shows promising trajectory detection results, that on average more than 90% of vehicle trajectories are constructed by the proposed methods from the NGSIM videos. The HASDA model outperforms the reference algorithm and shows superior transferability in the training-testing experiment. Further work needs to be done to improve the algorithm performance against shadows and occlusions by incorporating more intelligent and advanced techniques.
Full-text available
Obtaining the trajectories of all vehicles in congested traffic is essential for analyzing traffic dynamics. To conduct an effective analysis using trajectory data, a framework is needed to efficiently and accurately extract the data. Unfortunately, obtaining accurate trajectories in congested traffic is challenging due to false detections and tracking errors caused by factors in the road environment, such as adjacent vehicles, shadows, road signs, and road facilities. Unmanned aerial vehicles (UAVs), with incorporating machine learning and image processing, can mitigate these difficulties by their ability to hover above the traffic. However, research is lacking regarding the extraction and evaluation of vehicle trajectories in congested traffic. In this study, we propose and compare two learning-based frameworks for detecting vehicles: the aggregated channel feature (ACF), which is based on human-made features, and the faster region-based convolutional neural network (Faster R-CNN), which is based on data-driven features. We extend the detection results to extract vehicle trajectories in congested traffic conditions from UAV images. To remove the errors associated with tracking vehicles, we also develop a postprocessing method based on motion constraints. Then, we conduct detailed performance analyses to confirm the feasibility of the proposed framework on a congested expressway in Korea. The results show that Faster R-CNN outperforms the ACF in images with large objects and in those with small objects if sufficient data are provided. This framework extracts the vehicle trajectories with high precision, making them available for analyzing traffic dynamics based on the training of just a small number of positive samples. The results of this study provide a practical guideline for building a framework to extract vehicles trajectories based on given conditions.
Full-text available
Probe vehicle data are increasingly becoming the primary source of traffic data. In current practice, traffic volumes and speeds are collected from inductive loop or similar devices. As probe vehicle data become more widespread, it is imperative that methods are developed so that traffic state estimators like speed, density and flow can be derived from probe vehicle data as well. In this paper, a methodology to estimate traffic flow on a freeway based on probe vehicle trajectory data combined with traffic shockwave theory is proposed. In essence, probe vehicle trajectory can indicate the free-flowing and congested regimes. By using LWR kinematic wave model, a shockwave can be identified that separates both regimes. From the formation of the shockwave, flows for each regime are estimated. To identify the shockwave, k-means clustering is applied to the data. When applied to simulated data, the error of the estimated flow during free-flow ranges from -9% to 1% with an average of -5%. The estimated flow during congestion has an error of 0%. Based on the results, this paper shows that the proposed method can predict traffic flow with a reasonable accuracy under congested and free-flow conditions.
Full-text available
This research develops an advanced vehicle detection method, which improves the original Viola-Jones (V-J) object detection scheme for better vehicle detections from low-altitude unmanned aerial vehicle (UAV) imagery. The original V-J method is sensitive to objects' in-plane rotation, and therefore has difficulties in detecting vehicles with unknown orientations in UAV images. To address this issue, this research proposes a road orientation adjustment method, which rotates each UAV image once so that the roads and on-road vehicles on rotated images will be aligned with the horizontal direction and the V-J vehicle detector. Then, the original V-J can be directly applied to achieve better efficiency and accuracy. The enhanced V-J method is further applied for vehicle tracking. Testing results show that both vehicle detection and tracking methods are competitive compared with other existing methods. Future research will focus on expanding the current methods to detect other transport modes, such as buses, trucks, motorcycles, bicycles, and pedestrians.
In recent years, unmanned aerial vehicle (UAV) has become an increasingly popular tool for traffic monitoring and data collection on highways due to its advantage of low cost, high resolution, good flexibility, and wide spatial coverage. Extracting high-resolution vehicle trajectory data from aerial videos taken by a UAV flying over target highway segment becomes a critical research task for traffic flow modeling and analysis. This study aims at proposing a novel methodological framework for automatic and accurate vehicle trajectory extraction from aerial videos. The method starts by developing an ensemble detector to detect vehicles in the target region. Then, the kernelized correlation filter is applied to track vehicles fast and accurately. After that, a mapping algorithm is proposed to transform vehicle positions from the Cartesian coordinates in image to the Frenet coordinates to extract raw vehicle trajectories along the roadway curves. The data denoising is then performed using a wavelet transform to eliminate the biased vehicle trajectory positions. Our method is tested on two aerial videos taken on different urban expressway segments in both peak and non-peak hours on weekdays. The extracted vehicle trajectories are compared with manual calibrated data to testify the framework performance. The experimental results show that the proposed method successfully extracts vehicle trajectories with a high accuracy: the measurement error of Mean Squared Deviation is 2.301 m, the Root-mean-square deviation is 0.175 m, and the Pearson correlation coefficient is 0.999. The video and trajectory data in this study are publicly accessible for serving as benchmark at .</italic
One challenging problem about connected automated vehicles is to optimize vehicle trajectories considering realistic constraints (e.g. vehicle kinematic limits and collision avoidance) and objectives (e.g., travel time, fuel consumption). With respect to communication cost and implementation difficulty, parsimonious trajectory planning has attracted continuous interests. In this paper, we first analyze the feasibility conditions for a general continuous-time trajectory planning problem and then propose an analytical solution method for two important boundary trajectory problems. We further propose a discrete-time model with a more general objective function and a certain sparsity requirement that helps parsimonious planned trajectories. This sparsity requirement is implemented with a l1 norm regulatory term appended to the objective function. Numerical examples are conducted on several representative applications and show that the proposed design strategy is effective.
LIDAR has become an important part of many autonomous vehicles with its advantages on distance measurement and obstacle detection. LIDAR produces point clouds which have important information about surrounding environment. In this paper, we collected trajectory data on a two lane urban road using a Velodyne VLP-16 Lidar. Due to dynamic nature of data collection and limited range of the sensor, some of these trajectories have missing points or gaps. In this paper, we propose a novel method for recovery of missing vehicle trajectory data points using microscopic traffic flow models. While short gaps (less than 5 seconds) can be recovered with simple linear regression, and longer gaps are recovered with the proposed method that makes use of car following models calibrated by assigning weights to known points based on proximity to the gaps. Newell's, Pipes, IDM and Gipps' car following models are calibrated and tested with the ground truth trajectory data from LIDAR and NGSIM I-80 dataset. Gipps' calibrated model yielded the best result.
This paper presents the methodology and results from a study to extract empirical microscopic vehicular interactions from a probe vehicle instrumented with sensors to monitor the ambient vehicles as it traverses a 28 mi long freeway corridor. The contributions of this paper are two fold: first, the general method and approach to seek a cost-effective balance between automation and manual data reduction that transcends the specific application. Second, the resulting empirical data set is intended to help advance traffic flow theory in general and car following models in particular. Generally the collection of empirical microscopic vehicle interaction data is either too computationally intensive or labor intensive. Historically automatic data extraction does not provide the precision necessary to advance traffic flow theory, while the labor demands of manual data extraction have limited past efforts to small scales. Key to the present study is striking the right balance between automatic and manual processing. Recognizing that any empirical microscopic data for traffic flow theory has to be manually validated anyway, the present study uses a “pretty good” automated processing algorithm followed by detailed manual cleanup using an efficient user interface to rapidly process the data. The study spans roughly two hours of data collected on a freeway during the afternoon peak of a typical weekday that includes recurring congestion. The corresponding data are being made available to the research community to help advance traffic flow theory in general and car following models in particular.