Scalable Algorithms for Large High-Resolution Terrain Data∗
Pankaj K. Agarwal
In this paper we demonstrate that the technology required to per-
models has matured enough to be ready for use by practitioners.
We also demonstrate the impact that high-resolution data has on
common problems. To our knowledge, some of the computations
we present have never before been carried out by standard desktop
computers on data sets of comparable size.
Categories and Subject Descriptors: D.2 [Software]: Software
Engineering; F.2.2 [Nonnumerical Algorithms and Problems]: Ge-
ometrical problems and computations; H.2.8 [Database Manage-
ment]: Database Applications—Data Mining, Image Databases,
Spatial Databases and GIS
General Terms: Performance, Algorithms
Keywords: LIDAR, Massive data, GIS
The revolution in sensing and mapping technologies is providing
an unprecedented opportunity to characterize and understand the
earth’s surface, its dynamics, and its properties. For instance, sec-
ond generation airborne LIDAR technology can map the earth’s
surface at a 15-20cm horizontal resolution, and the future genera-
tion of LIDAR scanners are expected to generate high-resolution
maps of other planets. Capitalizing on these opportunities and
transforming these massive amounts of topographic data into use-
ful information for vastly different types of users requires solving
several challenging algorithmic problems. GIS, geometric comput-
ing and other disciplines have made great strides, during the last
few years, in providing theoretical insights, algorithmic tools, and
software for meeting many of the challenges that arise when these
large data sets have to be processed.
∗Work in this paper was supported by ARO grant W911NF-04-1-0278.
Pankaj K. Agarwal and Thomas Mølhave are also supported by NSF un-
der grants CCR-00-86013, CCR-02-04118, and DEB-04-25465, and by a
grant from the U.S.–Israel Binational Science Foundation. Lars Arge and
Morten Revsbæk are also supported by a NABIIT grant from the Danish
Strategic Research Council and by MADALGO - Center for Massive Data
Algorithmics - a Center of the Danish National Research Foundation.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
COM.Geo 2010 June 21-23, 2010, Washington, DC, USA.
Copyright 2010 ACM 978-1-4503-0031-5 ...$10.00.
Figure 1. The coast line of Denmark, the main study area for the
experiments in this paper.
A major challenge is the sheer size of the gathered topographic
data which is exposing serious scalability problems with existing
GIS systems. The main reason for these problems is that algo-
rithms in current systems often assume the data to fit in the main
memory of the computer, which is not the case for the large data
sets provided by modern LIDAR scanners. A main memory access
is around 106times faster than a disk access, and programs that do
not use the disk efficiently are essentially slowed done by that fac-
tor. Thus it is key to use algorithms try to minimize disk accesses
when handling massive data sets. Developing such Input/Output-
efficient (or just I/O-efficient) algorithms have been a flourishing
research area for the past many years.
There is a number of ways traditional GIS applications attempt
to deal with the massive terrain elevation data sets. The simplest
common technique is to thin the point cloud by more or less arbi-
trarily discarding a significant fraction of the points. A similar form
of thinning is often applied to gridded terrain models where neigh-
boring grid cells are averaged (or discarded) to produce a grid with
a much larger cells size and correspondingly lower spatial fidelity.
These methods, although obviously very effective at reducing the
size of the point cloud or grid, are not ideal since important topo-
graphic features will be lost in such an operation. This can signif-
icantly reduce the accuracy and effectiveness of many topographic
analysis techniques. Finding a way to drastically reduce the data
set without invariably creating problems with non-trivial computa-
tions would in many cases likely not be much easier than solving
the original problem directly.
Another practical and popular method is to bin the data into tiles
of some fixed low size, transforming a very big data set into a list of
3 days 16 hours
1 day 17 hours
2 days 3 hours
3 hours 22 min
1 day 4 hours
2 hours 7 min
1 day 8 hours
2 hours 41 min
Table 1. Running times for the data sets. The number of cells is given as the total number of cells in the grid, and the number of cells that
contain real data (not NODATA). The SRTM and ASTER data sets are provided as grids, so we did not have to construct the model.
 A. Crauser, P. Ferragina, K. Mehlhorn, U. Meyer, and
E. Ramos. Randomized external-memory algorithms for
some geometric problems. International Journal of Computa-
tional Geometry & Applications, 11(3):305–337, June 2001.
 A. Danner, T. Mølhave, K. Yi, P. K. Agarwal, L. Arge, and
H. Mitasova. TerraStream: from elevation data to watershed
hierarchies. In GIS ’07: Proceedings of the 15th annual ACM
international symposium on Advances in geographic informa-
tion systems, pages 1–8, New York, NY, USA, 2007. ACM.
 H. Edelsbrunner. Geometry and Topology for Mesh Genera-
tion. Cambridge University Press, England, 2001.
 H. Edelsbrunner, J. Harer, and A. Zomorodian. Hierarchical
Morse complexes for piecewise linear 2-manifolds. In Proc.
ACM Sympos. Comput. Geom., pages 70–79, 2001.
 H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topolog-
ical persistence and simplification. In Proc. IEEE Sympos.
Found. Comput. Sci., pages 454–463, 2000.
 J. M. Eshøj, P. K. Bøcher, J.-C. Svenning, T. Mølhave, and
L. Arge. Impacts of 21st century sea-level rise on a major
city (Aarhus, Denmark) â˘A¸ S an assessment based on fine-
resolution digital topography and a new flooding algorithm.
IOP Conf. Series: Earth and Environmental Science, 8:12—
 T. G. Farr, P. A. Rosen, E. Caro, R. Crippen, R. Duren,
S. Hensley, M. Kobrick, M. Paller, E. Rodriguez, L. Roth,
D. Seal, S. Shaffer, J. Shimada, J. Umland, M. Werner, M. Os-
kin, D.Burbank, andD.Alsdorf.Theshuttleradartopography
mission. Rev. Geophys., 45, 5 2007.
 M. Isenburg, Y. Liu, J. Shewchuk, and J. Snoeyink. Stream-
ing computation of Delaunay triangulations. In Proceedings
of SIGGRAPH, 2006.
 M.Isenburg, Y.Liu, J.Shewchuk, J.Snoeyink, andT.Thirion.
Generating raster DEM from mass points via TIN streaming.
In M. Raubal, H. Miller, A. Frank, and M. Goodchild, edi-
tors, Geographic Information Science - Fourth International
Conference, GIScience 2006, Münster, Germany, September
 B. Lehner, K. Verdin, and A. Jarvis. New global hydrography
derived from spaceborne elevation data. Eos, Transactions,
 L. Mitas and H. Mitasova. Spatial interpolation. In P. Longley,
M.F.Goodchild, D. J.Maguire, andD.W.Rhind, editors, Ge-
ographic Information Systems - Principles, Techniques, Man-
agement, and Applications. Wiley, 1999.
 H. Mitasova, L. Mitas, and R. S. Harmon. Simultaneous
spline interpolation and topographic analysis for lidar eleva-
tion data: methods for open source gis. IEEE Geoscience and
Remote Sensing Letters, 2(4):375–379, 2005.
 T. Mølhave. Handling Massive Terrains and Unreliable Mem-
ory. PhD thesis, Aarhus University, Department of Computer
Science, 8 2009.
 NASA and Japan (METI). NASA, Japan release most com-
plete topographic map of earth, June 2009.
 J. F. O’Callaghan and D. M. Mark. The extraction of
drainage networks from digital elevation data. Computer Vi-
sion, Graphics and Image Processing, 28, 1984.
 M. Revsbæk. I/O efficient algorithms for batched union-find
with dynamic set properties and its applications to hydrolog-
ical conditioning. Master’s thesis, Aarhus University, Den-
 J. Sankaranarayanan, H. Samet, and A. Varshney. A fast all
nearest neighbor algorithm for applications involving large
point-clouds. Comput. Graph., 31(2):157–174, 2007.
 P. Soille, J. Vogt, and R. Colombo. Carving and adaptive
drainage enforcement of grid digital elevation models. Water
Resources Research, 39(12):1366–1375, 2003.
 D. Tarboton. A new method for the determination of flow di-
rections and contributing areas in grid digital elevation mod-
els. Water Resources Research, 33:309–319, 1997.
 A. Tribe. Automated recognition of valley lines and drainage
networks from grid digital elevation models: a review and a
new method. Journal of Hydrology, 139:263–293, 1992.