Automated Ground-Plane Estimation
for Trajectory Rectiﬁcation
Ian Hales, David Hogg, Kia Ng, and Roger Boyle
University of Leeds, Woodhouse Lane, Leeds, LS2 9JT
Abstract. We present a system to determine ground-plane parameters
in densely crowded scenes where use of geometric features such as paral-
lel lines or reliable estimates of agent dimensions are not possible. Using
feature points tracked over short intervals, together with some plausible
scene assumptions, we can estimate the parameters of the ground-plane
to a suﬃcient degree of accuracy to correct usefully for perspective dis-
tortion. This paper describes feasibility studies conducted on controlled,
simulated data, to establish how diﬀerent levels and types of noise aﬀect
the accuracy of the estimation, and a veriﬁcation of the approach on live
data, showing the method can estimate ground-plane parameters thus
allowing, improved accuracy of trajectory analysis.
Keywords: ground-plane, trajectory, rectiﬁcation, crowd-motion.
In computer vision one often wishes to examine objects in terms of their size,
speed or location, accurate measurement of which is hampered by several types
of distortion that occur when the camera captures the scene. This is particularly
relevant in applications such as behaviour analysis in crowds, where such metrics
can be used to detect events or measure crowd density. By estimating the trans-
formation undergone by the coordinate system, we can invert it to counteract
the eﬀect of such distortions and obtain a more accurate view of the world. If
we can obtain the ground-plane orientation with respect to the camera, we can
correct for one of the most prevalent sources of error – perspective distortion.
Two broad approaches are apparent within the literature: “formal methods”,
often in terms of the fundamental matrix or image homographies; and “informal
methods”, providing a coarse, but still usable, impression of the scene (e.g. a scale
ratio from front to back). A common feature is the idea of “vanishing points”,
which lie on the horizon line. Having obtained the equation of the horizon, it is
trivial to perform aﬃne rectiﬁcation . Metric rectiﬁcation can then be achieved
using knowledge of known lengths or angles, or equality of angles . These
points can be determined using pairs of imaged parallel lines  and inclusion
of a vanishing point in a third direction allows for metric measurement .
The Manhattan Assumption  oﬀers a viable framework to obtain additional
vanishing points from the background in man-made scenes . Alternatively,
R. Wilson et al. (Eds.): CAIP 2013, Part II, LNCS 8048, pp. 378–385, 2013.
Springer-Verlag Berlin Heidelberg 2013
Automated Ground-Plane Estimation for Trajectory Rectiﬁcation 379
known angles in the scene  or known reference lengths  can produce similar
results. In pedestrian scenes, measurements (e.g. projected foot-head heights
[8,9]) provide a valid reference length. Less formal methods tend to rely on the
estimation of a single ground-plane. It is again common to use projected foot-
head heights  and accuracy can be improved by tracking individuals, taking
relative height measurements for each , which minimizes potential variations
in reference length.
We base our work within the context of pedestrian crowd analysis. The meth-
ods discussed above rely on assumptions unlikely to hold in a densely crowded
scene due to inter-occlusion and the inability to see geometric features in the
background. Stauﬀer et al.  mention an alternative approach using the as-
sumption of constant speed of moving objects, built upon by Bose and Grimson
, who use a blob tracker to generate trajectories. They then obtain the vanish-
ing line by assuming constant speed, before using inter-trajectory speed ratios
to obtain metric rectiﬁcation. However, maintained tracking was necessary to
achieve metric rectiﬁcation, making this approach infeasible in our domain.
We propose to use the local speed of tracked feature points as a calibration
measurement, along with the assumption of constant speed, to reconstruct the
ground-plane parameters. We do not require prolonged tracking of features pro-
vided we observe pedestrian motion throughout the scene. The remainder of this
paper will describe our method, prove its validity on simulated data and assess
its accuracy on real-world benchmark data.
2 Ground-Plane Estimation
Above, we saw that it is possible to construct the 3D scene using information
from measuring objects of known real-world size at various positions within the
image. In this section we show that it is equally possible to use measurements of
object speed to reconstruct the plane upon which agents are moving. Throughout
this work we assume that all objects move on a single, linear plane and that
each observed object is moving at constant (or near-constant) speed. We do not,
however, require that all objects move at the same speed.
We ﬁrst track sparse features frame-by-frame for some period using the KLT
tracker , until we can no longer reliably continue the trajectory. Since we deal
with high density pedestrian crowds, heavy inter-occlusion is likely to result in
many short trajectories. However, it only takes a few frames for us to gain
valuable information from the trajectory. It is plausible that objects may have
many features or no features at all assigned to them, but this is not a major
concern provided we gather information from various scene positions.
We deﬁne a trajectory as a time series of points upon on a plane recorded
at equally spaced time-steps. We observe these in the image-coordinate system
and wish to obtain their respective points within the camera coordinate system
through perspective back-projection. We deﬁne a time-series (Xτ
as the x-coordinates of trajectory τat times 1 to Nτand its projection into
image space as (xτ
Nτ). We represent the orientation of the ground-
plane in terms of its unit normal vector, n, given by equation (1). θrepresents
380 I. Hales et al.
the angle of elevation (rotation in the x-axis) and ψto represent the angle of
yaw (rotation in the z-axis).
Given a ground-plane n·X=dfor some point X=(Xτ
world and some α, equations (2) to (4) show the back-projection of a point
t)onto the point (Xτ
t)on the ground plane at time t,where
αis the negative reciprocal of the focal length fand dis the shortest distance
between the camera and the ground-plane.
2.1 Speed as a Measuring Stick
An object’s observed speed varies with perspective as do its other properties
such as height . Given a number of partial trajectories of constant speed,
we show that it is possible to obtain reasonable estimates for the ground-plane
parameters. Assuming a tracked object’s real-world speed is constant, we can
think of the trajectory as a set of piecewise linear segments, each with length
given by the 3D Euclidean distance formula. We can use this distance as the
“measuring-stick” from which to gain an estimation of the ground-plane.
Firstly we make some simplifying assumptions. The height of the camera
with respect to the ground-plane, d, acts primarily as a scaling parameter. As
we only aim to reconstruct to scale, we set this to 1 in (4), thus simplifying
further calculation. We also observe that in the majority of scenes, the camera
height is substantial compared to the height variations of the tracked feature
points. In section 3 we oﬀer results showing that tracked feature height variation
does not signiﬁcantly aﬀect the estimate of ground-plane orientation. Finally,
motion variation based on the articulation of our moving agents is negligible.
Substituting (2-4) into the distance formula, we obtain (5). This relates the
known 2D projected positions of a feature point at consecutive intervals t−1and
t, their 3D distance in the camera plane and the plane parameters. Hereafter,
we use Lτto represent the set of distance measurements at all time-intervals for
some trajectory, τ. Each element of this set is deﬁned as follows:
For a single trajectory τ, we deﬁne a set of the distances at all time intervals:
Automated Ground-Plane Estimation for Trajectory Rectiﬁcation 381
We denote the mean and standard deviation of the set as μ(Lτ)andσ(Lτ)
for short. We now have a relationship between known image-coordinates and
the unknown camera-coordinates with respect to some constant distance along
points in a trajectory. We use this to measure how well a given set of parameters,
θ,ψand αﬁt the observed data. If a feature point is moving at constant speed
and we have a good set of parameters, σ(Lτ) should be close to zero. Conversely,
we would expect a poor parameterization to give a high spread in Lτ.
Since we do not wish to impose the constraint that all ob jects must move at
constant speed, we normalize σ(Lτ)byμ(Lτ) giving us a speed-invariant measure
of correctness. We pose this correctness measure in terms of a minimization over
the sum of squared errors for each trajectory as shown in equation (7).
As we expect tracked feature points to have originated from a set of homogeneous
objects (e.g. pedestrians), we can reasonably assume that they should all move
with similar (although not identical) speed. As such we include an additional
term to constrain the system, E2in equation (8)s:. Here we take the standard-
deviation over the set of all mean speeds which penalises a high spread of speeds,
thereby preventing some of the least plausible conﬁgurations.
2.2 Minimizing the Error
We solve for θ,ψand αby minimizing the error E=E1+λE2over all input
trajectories. We experimented with non-linear optimization algorithms but due
to the irregularity of the problem-space away from the vicinity of the true value,
gradient descent methods tend to fall into local minima. As such we fall back to
a multi-resolution global search to ﬁnd the correct region. At the ﬁrst level we
search all combinations of α,θand ψ, with a coarse mesh – increments of 15◦
in both the θand ψfeasible ranges (0◦to 90◦and −45◦to 45◦)andforαan
exponential search in the range 10−3to 100to ﬁnd its scale.
We then take the point with minimum error from this search and produce a
ﬁner grid around it; now searching αlinearly and reducing the step size for θ
and ψto 10% of their previous value. We repeat this procedure until either the
lowest error point is below a given tolerance (empirically 10−5is suﬃcient) or
we reach the maximum level allowed for search. We have observed 3 levels to be
suﬃcient for an accurate estimation on simulated data with some noise.
3 Experiments on Simulated Data
To prove the initial viability of this method we ﬁrst describe a number of exper-
iments on simulated data. This allows us to examine how various types of noise
and violations of the initial premise aﬀect the accuracy of the experiments. Core
sources of error are likely to be:
382 I. Hales et al.
0 0.2 0.4 0.6 0.8 1
Error for Inter−Feature Speed Variation
0 0.2 0.4 0.6 0.8 1
Error for Intra−Feature Speed Variation
0 0.2 0.4 0.6 0.8 1
Error for Intra−Feature Height Variation
Fig. 1. An example of our simulated trajectories (a) and the results of our noise ex-
periments (b)-(d) in terms of average angular error (dot product between ground-truth
and estimated plane-normals)
1. Variation in inter-trajectory speed. That is, agents move at diﬀerent speeds.
2. Variation in intra-trajectory speed; an agent varies their speed whilst moving.
3. Variation in tracked point height; i.e. some trajectories recorded on feet,
some on shoulders, etc.
We generate simulated trajectories on a number of planes with parameterized
noise (Fig. 1a), allowing the speed and height of trajectories to vary according
to Gaussian distributions. This lets us examine the eﬀect of the above error
sources on reconstruction accuracy. We assess accuracy by rectifying the image-
plane trajectories with the ground-truth parameters and our estimations, then
compare the spread of normalized speeds of each. Given a perfect reconstruction,
we see zero-error and all measurements are relative to a mean speed of 1.
We perform three sets of experiments on a number of diﬀerent planes across
the feasible range, varying the potential source of error in each in terms of the
standard deviation, between 0 and 1, of a Gaussian distribution with mean 1.
Examining the above issues in order, we ﬁrst investigate the eﬀect of the diﬀerent
agents in the scene moving at diﬀerent speeds. Agents’ initial speeds are chosen
randomly from the distribution and remain constant throughout the simulation.
From Fig. 1b we observe that even in extreme cases the average error stays low
- below 10% of the mean speed. We expect the eﬀect of intra-trajectory speed
variation to be more pronounced as it is the deﬁning metric used to recover the
parameters. We take the speed of a feature at each frame from the distribution.
Fig. 1c shows that although we experience more noise than with inter-trajectory
speed variation, it is not so substantially pronounced as to seriously damage our
result. We see that height variation has negligible aﬀect on the accuracy of our
solution – data was generated using a feasible camera height of 10m; as such the
diﬀerence in point height is relatively small.
Points tracked at diﬀerent heights with a low-positioned camera will be af-
fected more strongly, but in most real-world scenes the distances between the low-
est and highest tracked points are negligible with respect to the camera height.
Our experiments on simulated data show that over realistic ranges, the three po-
Automated Ground-Plane Estimation for Trajectory Rectiﬁcation 383
Fig. 2. Example stills from PETS2009 (a) and students003 (d) video datasets
tential sources of error identiﬁed above have negligible detrimental eﬀect on the
accuracy of estimation. Of particular importance is the intra-tra jectory speed
variation, which is a violation of our assumptions and yet still does not have an
especially pronounced eﬀect on the quality of our estimation.
4 Experiments on Video Data
We gather trajectories using the tracker and split them at sharp spikes in speed,
which are easily identiﬁable violations of the constant-speed assumption. We are
typically left with considerably more trajectories than is necessary (or tractable)
to process. Indeed many of these are extremely short – only 2 or 3 frames. As
such we ﬁlter out trajectories shorter than 4 frames, as these provide us with no
additional information towards our unknowns.
When tracking pedestrians, we observe two key diﬀerences to the trajectories
produced for simulation. Firstly, we commonly track several points per person;
secondly pedestrians tend to behave in a more “human” manner than in our
simulated data – travelling in groups. As such we observe that many of our
trajectories are extremely similar and their inclusion adds little information, but
slows the processing down considerably. Therefore we ﬁrst align trajectories to
each other using the Hungarian algorithm , then cluster the trajectories based
on their distance and shape similarity using Aﬃnity Propagation , taking the
resulting clusters as input to our ground-plane estimation system.
The majority of our results are given against the PETS2009 dataset, specif-
ically videos taken from View001 shown in 2a and View002. Since these come
with full intrinsic and extrinsic calibration, we can directly compare rotation
angles. We also examine one other video dataset: “students003”, Fig. 2b, from
the University of Cyprus. This is only provided with an image to ground-plane
homography and so does not allow for direct rotation comparison. We therefore
use the homography to determine the 2D feature coordinates on the ground-
plane and compare the speed ratios for each trajectory. We compare with the
method in  (see section 1) although we exchanged the blob tracker for our
KLT approach as the former provided insuﬃcient tracks for reliable estimation
when applied to our data.
384 I. Hales et al.
Tabl e 1. Results for plane estimation on videos from the PETS2009 dataset
Dataset Subset View Time Index θError (degrees) ψError (degrees)
S0 Regular Flow 001 14-06 +7.2 -0.4
S1 L1 001 13-59 +1.1 +11.7
S1 L2 001 14-06 +7.5 -0.5
S1 L1 002 13-57 +0.1 -9.9
S1 L2 002 14-31 +1.9 -4.4
0 5 10 15 20 25 30 35 40 45
0 10 20 30 40 50 60 70
Fig. 3. Comparison of trajectory speeds rectiﬁed using the ground-truth (black,
dashed) and estimated parameters (red , so li d )andBose(green, dashed ). Examples
are longest trajectories from (a) “students003” and (b) PETS2009 Regular Flow. We
see that even in trajectories with some tracking error, we obtain a sensible result,
generally better than Bose.
Table 1 shows the orientation error for several scenes in terms of direct dif-
ference for the two rotation components, θand ψ. There is an area of high re-
ﬂectance in View002, which interrupts tracking of most individuals, despite this,
the majority of our estimates are within 10◦of the ground-truth values – suﬃ-
cient for approximate correction of trajectory speeds. Fig. 3 shows some example
comparisons of normalized trajectory speeds from the “students003” and PETS
Regular Flow datasets rectiﬁed using the provided homography/calibration, our
estimation and that of . We see that our matching is generally very close,
even on trajectories with some tracking error, whereas the method of Bose and
Grimson performs poorly. We put this down to the ﬂexibility of our approach in
minimising spread rather than a strict constant speed assumption.
5 Conclusions and Further Work
This paper has considered the problem of reconstructing 3D geometry from 2D
observations taken from videos of pedestrian data taken using a single uncali-
brated camera. Our method diﬀers from previous techniques as it requires no
knowledge of scene geometry or a ﬁxed size object; needing only motion of indi-
viduals. We have provided evidence on simulations for the validity of our method
and the assumptions held within. We have then shown results on the PETS2009
dataset which illustrate the success of the method in a number of cases and have
given a qualitative comparison for another. In continuation of this work, we plan
to account for variations in trajectory height to allow for tracking individuals
on diﬀerent parts of their bodies. We then intend to extend the method into the
multi-planar domain, such that planes can be estimated and their boundaries
drawn to more accurately and realistically model real-world scenes.
Automated Ground-Plane Estimation for Trajectory Rectiﬁcation 385
1. Liebowitz, D., Zisserman, A.: Metric rectiﬁcation for perspective images of planes.
In: Proceedings CVPR 1998, pp. 482–488. IEEE Comput. Soc. (1998)
2. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn.
Cambridge University Press (2004)
3. Criminisi, A., Reid, I.D., Zisserman, A.: Single view metrology. International Jour-
nal of Computer Vision 40(2), 123–148 (2000)
4. Coughlan, J.M., Yuille, A.L.: Manhattan world: Compass direction from a single
image by bayesian inference. In: Proceedings ICCV 1999, pp. 941–947 (1999)
5. Pﬂugfelder, R., Bischof, H.: Online auto-calibration in man-made worlds. In: Pro-
ceedings DICTA 2005, pp. 519–526 (2005)
6. Zhang, Z., Li, M., Huang, K., Tan, T.: Robust automated ground plane rectiﬁcation
based on moving vehicles for traﬃc scene surveillance. In: 2008 15th IEEE ICIP,
pp. 1364–1367. IEEE (2008)
7. Guo, F., Chellappa, R.: Video mensuration using a stationary camera. In: Leonardis,
A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 164–176. Springer,
8. Lv, F., Zhao, T., Nevatia, R.: Self-Calibration of a Camera from Video of a Walking
Human. In: Proceedings Pattern Recognition, vol. 1, pp. 562–567. IEEE Computer
Society, Los Alamitos (2002)
9. Micusik, B., Pajdla, T.: Simultaneous surveillance camera calibration and foot-
head homology estimation from human detections. In: CVPR 2010, pp. 1562–1569
10. Stauﬀer, C., Tieu, K., Lee, L.: Robust Automated Planar Normalization of Track-
ing Data. In: Proceedings IEEE Workshop on VS PETS (2003)
11. Krahnstoever, N., Mendon¸ca, P.R.S.: Autocalibration from Tracks of Walking Peo-
ple. In: British Machine Vision Conference, pp. 107–116 (2006)
12. Bose, B., Grimson, E.: Ground plane rectiﬁcation by tracking moving objects. In:
Proceedings IEEE Workshop on VS PETS (2003)
13. Shi, J., Tomasi, C.: Good features to track. In: Proceedings CVPR, pp. 593–600
14. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research
Logistics 2(1-2), 83–97 (1955)
15. Frey, B.J.J., Dueck, D.: Clustering by passing messages between data points.