ArticlePDF Available

Abstract and Figures

Natural history museums store millions of specimens of geological, biological, and cultural entities. Data related to these objects are in increasing demand for investigations of biodiversity and its relationship to the environment and anthropogenic disturbance. A major barrier to the use of these data in GIS is that collecting localities have typically been recorded as textual descriptions, without geographic coordinates. We describe a method for georeferencing locality descriptions that accounts for the idiosyncrasies, sources of uncertainty, and practical maintenance requirements encountered when working with natural history collections. Each locality is described as a circle, with a point to mark the position most closely described by the locality description, and a radius to describe the maximum distance from that point within which the locality is expected to occur. The calculation of the radius takes into account aspects of the precision and specificity of the locality description, as well as the map scale, datum, precision and accuracy of the sources used to determine coordinates. This method minimizes the subjectivity involved in the georeferencing process. The resulting georeferences are consistent, reproducible, and allow for the use of uncertainty in analyses that use these data.
Content may be subject to copyright.
Research Article
The point-radius method for georeferencing locality descriptions and
calculating associated uncertainty
Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building,
University of California, Berkeley, CA 94720, USA;
Department of Environmental Sciences, Policy & Management, 151 Hilgard
Hall #3110, University of California, Berkeley, CA 94720, USA
Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building,
University of California, Berkeley, CA 94720, USA
Natural history museums store millions of specimens of geological, biological,
and cultural entities. Data related to these objects are in increasing demand for
investigations of biodiversity and its relationship to the environment and
anthropogenic disturbance. A major barrier to the use of these data in GIS is
that collecting localities have typically been recorded as textual descriptions,
without geographic coordinates. We describe a method for georeferencing
locality descriptions that accounts for the idiosyncrasies, sources of uncertainty,
and practical maintenance requirements encountered when working with natural
history collections. Each locality is described as a circle, with a point to mark
the position most closely described by the locality description, and a radius to
describe the maximum distance from that point within which the locality is
expected to occur. The calculation of the radius takes into account aspects of the
precision and specificity of the locality description, as well as the map scale,
datum, precision and accuracy of the sources used to determine coordinates.
This method minimizes the subjectivity involved in the georeferencing process.
The resulting georeferences are consistent, reproducible, and allow for the use of
uncertainty in analyses that use these data.
1. Introduction
Natural history collections contain more than 2500 million specimens of
geological, biological, and cultural entities (Duckworth et al. 1993). These resources
constitute a foundation for numerous scientific disciplines, such as anthropology,
biogeography, biosystematics, conservation biology, ecology, and paleontology.
The data associated with natural history specimens vary widely in nature and
content between disciplines as well as between institutions, including everything
International Journal of Geographical Information Science
ISSN 1365-8816 print/ISSN 1362-3087 online #2004 Taylor & Francis Ltd
DOI: 10.1080/13658810412331280211
VOL. 18, NO.8,DECEMBER 2004, 745–767
from hand-written notes taken in the field at the time of collection (field notes) to
databases and published articles in professional journals. Underlying this variation,
however, is a core set of concepts common to all natural history collections, one of
the most important of which is the ‘collecting event’ - a description of the time and
place (locality) where a specimen was collected. The collecting event is an essential
association between the specimen and its natural context and is required for
quantitative analyses of specimen data together with other spatial data using
geographical information systems (GIS).
Despite increasing interest in natural history collection data, there remain
considerable obstacles to their use in GIS. The most prevalent of these obstacles is
that locality descriptions are often not georeferenced. Traditionally, localities have
been recorded as textual descriptions, often based on names and situations that can
change over time. This tradition is slowly changing to document localities
with supplementary geographic coordinates, the value of which are now widely
recognized (Krishtalka and Humphrey 2000, GBIF 2002) and the collection of
which has been greatly facilitated by the availability of the Global Positioning
System (GPS). Nevertheless, researchers interested in spatial analysis using museum
specimen data face a daunting legacy of data without coordinates. For example, at
the beginning of the ‘Mammal Networked Information System’ Project (MaNIS
2001), which consists of a distributed database network for mammal collections, 17
North American mammal collections pooled their specimen locality data for a
collaborative georeferencing effort. 87.8% of the 296 737 distinct collecting localities
from these collections had no coordinates. As of March 2003, 61.2% of the
3 260 453 specimens accessible through Lifemapper (KU-BRC 2002) did not have
georeferenced localities. These statistics are typical of natural history collections
data that are in digital media today, and indicate the magnitude of the
georeferencing challenge.
In the relatively few cases in which localities have been assigned coordinates,
there is seldom any documentation of the method used to determine those
coordinates. For example, of the localities for which coordinates had already been
determined at the outset of the MaNIS project, 78.4% of 36 197 records had no
associated metadata regarding the areas encompassed by the localities, nor did they
include information about the methods and assumptions used in assigning the
coordinates and uncertainties associated with them. Thus, even where present,
georeferenced localities may be of limited utility since we have no knowledge of
how they were generated.
To the best of our knowledge, there are currently no published, comprehensive
guidelines for georeferencing descriptive locality data. In the absence of such
guidelines, it has been common practise to assign a single point to a locality,
without estimates of how well that point represents the actual locality. Some
authors call for the capture of categorical measures of uncertainty (McLaren et al.
1996, Knyazhnitskiy et al. 2000), but do not investigate the nature of uncertainties,
their magnitudes, or how different sources of uncertainty combine. Given the
nature of locality descriptions and the variation in quality of coordinate sources
(maps and gazetteers, for example), uncertainty must be estimated under rigorous
guidelines. Whereas the coordinates of some localities can be determined with great
precision, others can only be roughly approximated. If these differences are
not taken into account, uncertainties cannot be incorporated into analyses and
746 J. Wieczorek et al.
it becomes impossible to determine whether a given record is appropriate for a
particular application. Spatial analysis without consideration of data uncertainty
may be of limited utility (Fisher 1999).
Numerous studies have investigated the positional accuracy of spatial data
(Goodchild and Hunter 1997; Leung and Yan 1998; Veregin 2000; Van Niel
and McVicar 2002), which is defined as the difference between test data and
corresponding ‘‘true’’ data of demonstrably higher accuracy (Goodchild and
Hunter 1997; FGDC 1998) and which is expressed as a standard error for a set of
points in a GIS layer (Stanislawski et al. 1996; Bonner et al. 2003).
The approaches used in these studies cannot be directly applied to estimating
uncertainty in georeferenced localities. In contrast to many spatial data sets, which
consist of unambiguously identifiable objects that can be directly and repeatedly
measured, it is difficult to provide true data against which to test for many of the
types of potential errors (‘‘uncertainties’’) that plague descriptive localities.
Here we present a simple, practical method for computing and recording
coordinates for a locality. We identify the potential sources of uncertainty, present
methods for determining their magnitudes, and provide a procedure for combining
uncertainties into a single estimate of ‘‘maximum’’ uncertainty associated with the
We propose that the method presented here provides a framework for
producing consistent and accurate interpretations of the locality descriptions and
represents a substantial improvement over current practices. Efficiency, accuracy,
and repeatability are our primary goals.
2. Georeferencing methods
2.1. Point method
There are various methods by which locality descriptions can be georeferenced.
The most commonly used is the ‘Point’ method, by which a single coordinate pair is
assigned to each location. This method ignores the fact that a locality record always
describes an area rather than a dimensionless point and that collecting may have
occurred anywhere within the area denoted. The specificity (that is, how well the
description constrains the interpretation of the area) with which a locality is
recorded directly influences the range of research questions to which the data can be
applied. For example, recording only the state from which a specimen was collected
will not be of much utility in the compilation of a species list for a National Park in
that state. By providing only a point for a georeferenced record, the distinction is
lost between locality descriptions that are specific and those that are not.
2.2. Shape method
The shape method is a conceptually simple method that delineates a locality
using one or more polygons, buffered points, and buffered polylines. A
combination of these shapes can represent a town, park, river, junction, or any
other feature or combination of features found on a map. While simple to describe,
the task of generating these shapes can be difficult. Creating shapes is impractical
without the aid of digital maps, GIS software, and expertise, all of which can be
relatively expensive. Also, storing a shape in a database is considerably more
complicated than storing a single pair of coordinates. Particular challenges to
making this method practical for georeferencing natural history collections data
Point-radius method for georeferencing 747
include assembling freely accessible digital cartographic resources and developing
tools for automation of the georeferencing process. Nevertheless, of all of the
approaches discussed here, this method has the potential to generate the most
complete digital spatial descriptions of localities.
2.3. Bounding box method
A common way to describe a geographic feature is to use a bounding box–aset
of two pairs of coordinates that together form a rectangle (in the appropriate
projection) that encompasses the locality being described. Geographic features in
the Alexandria Digital Library Gazetteer Server (ADL 2001) are sometimes
described using bounding boxes. The bounding box method is a limited shape
method by which only points or projected rectangles can be described. This method
offers some advantages over the shape method. For example, bounding boxes are
much easier to produce and store than arbitrary shapes, particularly in the absence
of digital cartographic tools. In addition, database queries can be performed on
bounding boxes without the need for a spatial database engine. However,
describing a locality with a bounding box tends to be less specific than describing it
with a more complicated shape.
2.4. Point-radius method
The point-radius method describes a locality as a coordinate pair and a distance
from that point (that is, a circle), the combination of which encompasses the full
locality description and its associated uncertainties. The key advantage of this
method is that the uncertainties can be readily combined into one attribute, whereas
the bounding box method requires contributions to uncertainty to be represented
independently in each of the two dimensions. This simple difference can have a
profound effect on the economy of georeferencing. Recognizing the practical
advantages for natural history collections, for which the economy of producing and
maintaining data are critical concerns, the guidelines for georeferencing descriptive
localities presented here will be described in terms of the point-radius method.
Nevertheless, the discussions of the sources of uncertainty are relevant to the
‘Shape’ and ‘Bounding box’ methods as well.
3. Applying the point-radius method
3.1. Step one: classify the locality description
Locality descriptions among natural history collection data encompass a
wide range of content in a baffling array of formats. From the perspective of
georeferencing, however, there are effectively only nine different categories of
descriptions (table 1). The locality type will determine the process of calculating
coordinates and uncertainties.
A locality description can contain multiple clauses and can match more than
one of the categories given in table 1. If any one of the parts falls into one of the first
three categories, the locality should not be georeferenced. Instead, an annotation
should be made to the locality record giving the reason why it is not being
georeferenced. In this way, anyone who encounters the locality in the future will
benefit from previous effort to diagnose problems with georeferencing the locality
If the locality description does not fall into any of the first three categories in
748 J. Wieczorek et al.
Table 1. Types of locality descriptions commonly found in natural history collections.
Type Description Examples
1) dubious The locality explicitly states that the information contained
therein is in question.
‘Isla Boca Brava?’, ‘presumably central
2) can not be located Either the locality data are missing, or they contain other
than locality information, or the locality cannot be
distinguished from among multiple possible candidates,
or the locality cannot be found with available references.
‘locality not recorded’, ‘Bob Jones’,
‘lab born’, ‘summit’, ‘San Jose, Mexico’
3) demonstrably inaccurate The locality contains irreconcilable inconsistencies. ‘Sonoma County side of the Gualala
River, Mendocino County’
4) coordinates The locality consists of a point represented with coordinate
‘42.4532 84.8429’, ‘UTM 553160 4077280’
5) named place The locality consists of a reference to a geographic feature
(e.g., town, cave, spring, island, reef, etc.) having a spatial
‘Alice Springs’, ‘junction of Dwight
Avenue and Derby Street’
6) offset The locality consists of an offset (usually a distance) from
a named place.
‘5 km outside Calgary’
7) offset along a path The locality describes a route from a named place. ‘1 km S of Missoula via Route 93’’
‘‘600 m up the W Fork of Willow Creek’
8) offsets in orthogonal directions The locality consists of a linear distance in each of two
orthogonal directions from a named place.
‘6 km N and 4 km W of Welna’
9) offset at a heading The locality contains a distance in a given direction. ‘50 km NE Mombasa’
Point-radius method for georeferencing 749
table 1, the most specific part of the locality description should be used for
georeferencing. For example, a locality written as ‘bridge over the St. Croix River,
4 km N of Somerset’ should be georeferenced based on the bridge rather than on
Somerset as the named place with an offset at a heading. The locality should be
annotated to reflect that the bridge was the locality that was georeferenced. If the
more specific part of the locality cannot be unambiguously identified, then the less
specific part of the locality should be georeferenced and annotated accordingly.
3.2. Step two: determine coordinates
The first key to consistent georeferencing using the point-radius method is to
have well-defined rules for determining the coordinates of the point. Coordinates
may be retrieved from gazetteers, geographic name databases, maps, or even from
other locality descriptions that have coordinates (for example, from localities
recorded in the field using a GPS receiver). The source and precision of the
coordinates should be recorded so that the validity of the georeferenced locality can
be checked at any time. The original coordinate system (for example, decimal
degrees, degrees minutes seconds, UTM) and geodetic datum (for example,
WGS84, NAD27) used in the coordinate source should also be recorded. This
information helps to determine sources and degree of uncertainty, especially with
respect to the original coordinate precision (section We recognize that
specific projects may require particular coordinate systems, but we find geographic
coordinates in decimal degrees to be the most convenient system for georeferencing.
Since this format relies on just two attributes, one for latitude and the other for
longitude, it provides a succinct coordinate description with global applicability
that is readily transformed to other coordinate systems as well as from one datum
to another. By keeping the number of recorded attributes to a minimum, the
chances for transcription errors are minimized.
When transforming coordinates from one system or datum to another, it is
important to preserve as much precision as possible. Coordinate precision is not
a measure of accuracy – it does not imply specific knowledge of the locality
represented by the coordinates; that role is assumed by the uncertainty
measurements, as described in section 3.3. Every coordinate transformation has
the potential to introduce error. The greater the precision with which the
coordinates are captured, the less the error that is propagated when further
coordinate transformations are made.
3.2.1. Identify named places and determine their extents
The first step in determining the coordinates for a locality description is to
identify the most specific named place within the description. Gazetteers and
geographic name databases provide coordinates for named places (commonly
referred to as ‘features’). However, we use the term ‘named place’ to refer not only
to traditional features, but also to places that may not have proper names, such as
road junctions, stream confluences, and cells in grid systems (for example,
Every named place occupies a finite space, or ‘extent’. In some sources, places
may be given in the form of bounding box coordinates for larger features (ADL
2001), but in general only a coordinate pair, not an extent, is given. Some
coordinate sources are accompanied by rules governing the placement of the
750 J. Wieczorek et al.
coordinates within a named place. For example, the US Geographic Names
Information Service (USGS 1981) places the coordinates of towns at the main post
office unless the town is a county seat, in which case the coordinates refer to the
county courthouse. Similarly, the same source places the coordinates of a river at its
mouth. In the absence of one of these specific points of reference, the geographic
centre of the named place is usually recorded. Because of these inconsistencies in
assigning coordinates for named places, including inconsistencies within a single
data source, the extent of the named place becomes an important consideration in
determining uncertainty.
The geographic centre (that is, the midpoint of the extremes of latitude and
longitude) of the named place is recommended as the location of the coordinates
because it describes a point where the uncertainty due to the extent of the named
place is minimized. If the locality describes an irregular shape (for example, a
winding road or river) and the geographic centre of that shape does not lie within
the locality, then the point nearest the geographic centre that lies within the shape is
the preferred reference for the named place and represents the point from which the
extent of and offsets from that named place should be calculated.
3.2.2. Determine offsets
Offsets consist of combinations of distances and directions from a named place.
Some locality descriptions explicitly state the path to follow when measuring the
offset (for example, ‘by road’, ‘by river’, ‘by air’, ‘up the valley’). In such cases the
georeferencer should follow the path designated in the description using a map with
the largest available scale to find the coordinates of the offset from the named
place. The smaller the scale of the map used, the more the measured distance on the
map is likely to overshoot the intended target.
It is sometimes possible to infer the offset path from additional supporting
evidence in the locality description. For example, in the locality ‘58 km NW of
Haines Junction, Kluane Lake’ supports a measurement by road since the final
coordinates by that path are nearer to the lake than going 58 km NW in a straight
line. Altitudes given with the locality description may also support one offset path
over another. By convention, localities containing two offsets in orthogonal
directions (for example, ‘10 km S and 5 km W of Bikini Atoll’) are always linear
Sometimes the environmental constraints of the collected specimen can imply
the method of measurement of the offset. For example, ‘30 km W of Boonville,
California’ if taken as a linear measurement, would lie in the Pacific Ocean. If this
locality is supposed to refer to the collection site of a terrestrial mammal, it is likely
that the collector followed the road heading west out of Boonville, winding toward
the coast, in which case the animal was collected on land.
If either of the above methods fail to distinguish the offset method, it may be
necessary to refer to more detailed supplementary sources, such as field notes or
itineraries, to determine this information. Supplementary sources do not always
exist or they may not contain additional information, making it difficult to
distinguish between offsets meant to be along a path and those meant to be along a
straight line. A particularly conservative approach is to not georeference localities
that fall into this category and instead record a comment explaining the reasoning.
However, value can still be derived by georeferencing localities that suffer from
Point-radius method for georeferencing 751
this ambiguity. One solution for dealing with these localities is to determine the
coordinates based on one or the other of the offset paths. Another solution is use
the midpoint between all possible paths. There may be discipline-specific reasons to
choose one solution over another, but the georeferencer should always document
the choice and accommodate the ambiguity in the uncertainty calculations.
3.3. Step three: Calculate uncertainties
The second key to consistent georeferencing using the point-radius method
(after determining the coordinates of the point) is to have well-defined rules
for determining the radius of the circle that encompasses the locality and all of
its associated uncertainties. Whenever subjectivity is involved, it is preferable to
overestimate uncertainty. We have identified the following six sources of
uncertainty inherent in descriptive localities or the resources used to georeference
1) extent of the locality
2) unknown datum
3) imprecision in distance measurements
4) imprecision in direction measurements
5) imprecision in coordinate measurements
6) map scale
3.3.1. Uncertainty due to the extent of the locality
The extents of named places mentioned in locality descriptions are an important
source of uncertainty. Not only are the rules for assigning coordinates to named
places largely undocumented in most coordinate data sources, but also the points of
reference may change over time – post offices and courthouses are relocated, towns
change in size, and so on. Moreover, there is no guarantee that the collector paid
attention to any particular convention when reporting a locality as an offset from a
named place. For example, ‘4 km E of Bariloche’ may have been measured from the
post office at the civic plaza, or from the bus station on the eastern edge of town, or
from anywhere else in Bariloche. In most cases we no longer have a way of knowing
the actual location used to anchor the offset.
The maximum uncertainty due to the extent of the named place (figure 1) is the
maximum distance between any two points within the named place (the ‘span’). If
we have coordinates for a named place from a gazetteer, for example, without
knowing where in the named place those coordinates lie, then the span is the
uncertainty due to the extent of the named place. If we have a map of the named
place, then a more refined uncertainty estimate can be made by measuring the
distance from the point marked by the coordinates to the point in the named place
furthest from those coordinates. The magnitude of the uncertainty value is
minimized if the coordinates mark the geographic centre of the named place and is
generally about half the span of the locality.
Many localities are based on named places that have changed in size over time;
current maps might not reflect the extents of those places at the time specimens
were collected there. If possible, extents should be determined using maps dating
from the same period as the specimen collecting events. In most cases, the current
extent of a named place will be greater than its historical extent and the uncertainty
752 J. Wieczorek et al.
will be somewhat overestimated if current maps are used. It is recommended to
record the named place, its extent, and the source of these data while georeferencing
so that users of the data can verify this important component of the uncertainty
3.3.2. Uncertainty due to an unknown datum
A geodetic datum is a mathematical description of the size and shape of the
earth and of the origin and orientation of coordinate systems. Seldom in natural
history collections have geographic coordinates been recorded together with
geodetic datum information. Even now, with GPS coordinates being recorded as
definitive locations, the geodetic datum is typically ignored. A missing datum
reference introduces a complicated ambiguity, which varies geographically (Welch
and Homsey 1997).
Many currently available maps of North America are based on the North
American Datum of 1927 (NAD27), but the North American Datum of 1983
(NAD83) is being used increasingly more often among newer maps. NAD83 is
essentially the same as the World Geodetic System of 1984 datum (WGS84), a
standard reference datum for the Global Positioning Systems (Defense Mapping
Agency 1991). We calculated the magnitude of uncertainty for North America
(Canada, USA, and Mexico) based on the differences between NAD27 and
NAD83/WGS84 (figure 2) using transformation functions in ArcGIS (ESRI,
Redlands, CA, USA). The uncertainty from not knowing which of these datums
was used to determine the coordinates varies in the contiguous USA from 0–104 m.
In the extreme western Aleutian Islands of Alaska, the discrepancy can be as much
as 237 m, while in Hawaii the differences are consistently ca. 500 m. On the global
scale, we calculated a maximum uncertainty of 3552 m due to an unknown datum.
This value was obtained by comparing pairwise distances between all combinations
of datums listed in the WGS84 definition (NIMA 2000) at one degree intervals in
both latitude and longitude. Given the potential magnitude of this uncertainty,
every effort should be made to use coordinate sources that provide datum
information and to record the datum of those sources as a routine part of data
3.3.3. Imprecision as a source of uncertainty
Precision is a measure of the specificity with which a measurement is recorded.
Precision can be difficult to gauge from a locality description; it is seldom, if ever,
explicitly recorded. Further, a database record may not reflect, or may reflect
incorrectly, the precision inherent in the original measurements, especially if the
Figure 1. The maximum (AB) and minimum (BC) uncertainties due to the extent of a
named place (shaded area).
Point-radius method for georeferencing 753
locality description in the database has undergone standardization, reformatting, or
secondary interpretation of the original description. There are distinct implications
that arise from the level of precision in distance measurements, directions
(headings), and coordinates. These are addressed in the subsections below. Uncertainty associated with distance precision.Distance may be recorded in
a locality description with or without significant digits, and those digits may or may
not be warranted. Distances are commonly recorded with few or no significant
digits, or even with fractions. Locality descriptions may also have undergone
reformatting to remove fractions or significant digits. For example, suppose a
specimen label was written in the field as locality ‘Lkm W of Inverness’, which was
entered into a database as ‘0.75 km W of Inverness’. In the original, it is clear that
Figure 2. Uncertainty from not knowing whether coordinates were taken from a source
using NAD27 or NAD83 – the geodetic datums most commonly used on maps in
Canada, the USA, and Mexico.
754 J. Wieczorek et al.
the collector was confident of recording the distance with one quarter km precision.
Without consulting the specimen tag it may be difficult to determine how much
distance precision is warranted. If the original tag is not consulted, then a
conservative way to ensure that distance precision is not inflated is to treat distance
measurements as integers with fractional remainders, thus 10.25 becomes 10 J,
thus accounting for the possible (and not uncommon) transformation of a fraction
in the original data to a real number in the database record. The uncertainty for
these distances should be calculated based on the fractional part of the distance,
using 1 divided by the denominator of the fraction.
Examples: ‘9 km N of Bakersfield’ (fraction is 1/1, uncertainty should be 1 km)
9.5 km N of Bakersfield (fraction is ½, uncertainty should be 0.5 km)
9.75 km N of Bakersfield (fraction is L, uncertainty should be 0.25 km)
9.6 km N of Bakersfield (fraction is 6/10, uncertainty shouldbe 0.1 km)
For measurements that appear as integer multiples of powers of 10 (for
example, 10, 20, 300, 4000), use 0.5 times ten to that power for the uncertainty.
Examples: ‘140 km N of Bakersfield (uncertainty should be 5 km)
100 km N of Bakersfield (uncertainty should be 50 km)
2000 m N of Bakersfield’ (uncertainty should be 500 m) Uncertainty associated with directional precision.Direction is almost always
expressed in locality descriptions using cardinal or inter-cardinal directions rather
than degree headings. This practise can introduce uncertainty due to directional
imprecision. The problem arises from the fact that we don’t know, out of context,
what the recorder meant by ‘north’ except that it is distinct from the other cardinal
directions. Hence, ‘north’ is not ‘east’ or ‘west’, but it could be any direction
between northeast and northwest. The directional uncertainty in these cases is 45
degrees in either direction from the given heading.
Example: ‘10 mi N of Bakersfield
If a related set of locality descriptions (for example, those by a collector on a
given expedition) contain any directions more specific than the cardinal directions
(for example, ‘NE’), then the person recording the data was demonstrably sensitive
to inter-cardinal directions. Thus, ‘NE’ could mean any direction between ENE and
NNE. The directional uncertainty in these cases is 22.5 degrees in either direction
from the given heading.
Example: ‘10 mi NE of Bakersfield
A locality description that contains further refined directions is correspondingly
more precise. Thus, in the following example the directional uncertainty is 11.25
Example: ‘10 mi ENE of Bakersfield
If the locality description contains two orthogonal directions, convention holds
that the measurements are linear in exactly those directions. In this case there is no
directional imprecision.
Example: ‘10 mi N and 5 mi E of Bakersfield Uncertainty associated with coordinate precision.Recording coordinates
with insufficient precision can result in unnecessary uncertainties. Therefore, as
many digits of precision as are reported by the source should be retained when
recording geographic coordinates. The magnitude of the uncertainty due to
Point-radius method for georeferencing 755
coordinate imprecision is a function not only of the precision with which the data
are recorded, but also a function of the datum and the coordinates themselves.
Uncertainty due to the imprecision with which the original coordinates were
recorded can be estimated as follows:
lat error2zlong error2
lat error~pR|(coordinate precision)=180:0
long error~pX|(coordinate precision)=180:0
where Ris the radius of curvature of the meridian at the given latitude, Xis the
distance from the point to the polar axis, orthogonal to the polar axis, and
coordinate precision is the precision with which the coordinates were recorded, as a
fraction of one degree. Ris given by Equation 2.
where ais the semi-major axis of the reference ellipsoid (the radius at the equator)
and eis the first eccentricity of the reference ellipsoid, defined by Equation 3.
where fis the flattening of the reference ellipsoid. Xis also a function of geodetic
latitude and is given by Equation 4.
X~Ncos latitudeðÞ ð4Þ
where Nis the radius of curvature in the prime vertical at the given latitude and is
defined by Equation 5.
Example: Latitude~10.27; Longitude~2123.6; Datum~WGS84
In this example the coordinate precision is 0.01 degrees. Thus, lat_error~
1.1061 km, long_error~1.0955 km, and the uncertainty resulting from the combina-
tion of the two is 1.5568 km. These calculations use a semi-major axis (a)of
6378137.0 m and a flattening ( f) of 1/298.25722356 based on the WGS84 datum.
Examples of error contributions for different levels of precision in the original
coordinates (using the WGS84 reference ellipsoid) are given in table 2. Calculations
are based on the same degree of imprecision in both coordinates and are given for
several different latitudes.
3.3.4. Uncertainty due to map scale
Maps have an inherent level of accuracy. Unfortunately, the accuracy of many
maps, particularly old ones, is undocumented. Accuracy standards generally explain
the physical error tolerance on a printed map, so that the net uncertainty is
dependent on the map scale. Following are the map accuracy standards published
by the US Geological Survey: ‘For maps on publication scales larger than 1:20,000,
not more than 10 percent of the points tested shall be in error by more than 1/30 inch,
756 J. Wieczorek et al.
measured on the publication scale; for maps on publication scales of 1:20,000 or
smaller, 1/50 inch’ (USGS 1999).
It is important to note that a digital map is never more accurate than the
original from which it was derived, nor is it more accurate when you zoom in on it.
The accuracy is strictly a function of the scale and digitizing errors of the original
map. A value of 1 mm of error can be used on maps for which the standards are not
published. This corresponds to about three times the detectable graphical error and
should serve well as an uncertainty estimate for most maps. By this rule, the
uncertainty for a map of scale 1:500 000, for example, is 500 m.
3.4. Step four: calculate combined uncertainties
The uncertainties associated with a given locality description depend on the
coordinate source, of which we identify four categories: GPS, locality record,
gazetteer, and map. Table 3 shows the potential sources of uncertainty that may be
relevant for each of the four categories. We describe how to calculate the various
combinations of uncertainties in the subsections below.
3.4.1. Calculating uncertainties having no directional imprecision
Distance uncertainties in any given direction are linear and additive. Following
is an example of a simple locality description and an explanation of the manner in
which multiple sources of uncertainty interact.
Example: ‘6 km E (via Highway 58) of Bakersfield
The potential sources of uncertainty for this example are 1) the extent of
Table 2. Uncertainty in meters as a function of latitude. Estimates of uncertainty are based
on coordinate precision measured in degrees using the WGS84 reference ellipsoid and
are rounded up to the next greater integer value.
0 degrees 30 degrees 60 degrees 85 degrees
1.0 156904 146962 124605 112109
0.1 15691 14697 12461 11211
0.01 1570 1470 1247 1122
0.001 157 147 125 113
0.0001 16 15 13 12
0.00001 2 2 2 2
Table 3. Potential sources of uncertainty inherent in georeferencing descriptive localities
using four common sources of coordinates.
Source of uncertainty
map X X X X X X
gazetteer X X X X X
Point-radius method for georeferencing 757
Bakersfield, 2) an unknown datum, 3) distance imprecision, and 4) map scale.
Suppose the centre of Bakersfield is 3 km from the eastern city limit and the
distance is being measured on a USGS map at 1:100,000 scale with the NAD27
datum. The uncertainty due to the extent of Bakersfield is 3 km, there is no
uncertainty due to an unknown datum, the distance imprecision is 1 km, and the
uncertainty due to map scale is 51 m (167 ft). The overall uncertainty for this
locality is the sum of these, or 4.051 km.
If there are two orthogonal offsets from a named place in the locality
description, uncertainties apply to each of the directions and the combination of
them is non-linear.
Example: ‘6 km E and 8 km N of Bakersfield
For the example above, ignore, for the moment, all sources of uncertainty
except those arising from distance imprecision. Under this simplification, a proper
description of the uncertainty is a bounding box centred on the point 6 km E and
8 km N of Bakersfield. Each side of the box is 2 km in length (1 km uncertainty in
each cardinal direction from the centre). In order to characterize the net uncertainty
with a single distance measurement, we need to calculate the radius of the circle
that circumscribes the above-mentioned bounding box. The radius could either be
measured on a map or calculated using a right triangle, the hypotenuse of which is
the line between the centre of the bounding box and a corner. Given the rule that
the distance precision is the same in both cardinal directions, the triangle will
always be a right isosceles triangle and the hypotenuse will always be ffiffi
ptimes the
distance precision. So, for the above example the uncertainty associated with the
distance precision alone is 1.414 km (figure 3).
Thus far we have accounted only for distance precision in this example. To
incorporate the uncertainty due to extent, determine the distance from the
geographic centre of the named place to the furthest point within the named place
in either of the two cardinal directions mentioned in the locality description. Add
this distance to the uncertainty due to the distance precision and multiply the sum
by ffiffi
p. Suppose the furthest extent of the city limits of Bakersfield either east or
north from the geographic centre is 3 km. There is a total of 4 km of uncertainty in
each of the two directions and the radius of the circumscribing circle is 4 km times
p, or 5.657 km (figure 4).
Suppose the coordinates for Bakersfield (352224@N, 1190104@W) are taken
from the GNIS database (USGS 1981), in which the datum is either NAD27 or
NAD83, and the coordinates are given with precision to the nearest second. At this
location the uncertainty due to an unknown datum is 79 m. The datum uncertainty
contributes in each of the orthogonal directions. Thus, the summed uncertainty in
each direction is 4.079 km and the net uncertainty is this number times ffiffi
5.769 km.
The coordinates in the GNIS database are given to the nearest second. The
uncertainty due to coordinate precision alone is about 39 m at the latitude of
Bakersfield based on Equation 1. This number already accounts for the
contributions in both cardinal directions, so it must not be multiplied by ffiffi
Instead, simply add the coordinate precision uncertainty to the calculated sum of
uncertainties from the other sources. For the example above, the net uncertainty is
5.769z0.039~5.808 km.
If the coordinates for Bakersfield had been taken from a USGS map with a
758 J. Wieczorek et al.
scale of 1:100 000, the datum would be on the map, so there would be no
contribution to the error from an unknown datum (assuming the georeferencer
records the datum with the coordinates). However, the uncertainty due to the map
scale would have to be considered. For a USGS map at 1:100 000 scale, the
uncertainty is 167 ft, or 51 m (based on the USGS map accuracy standards). In the
above example, the uncertainty in each direction is 4.051 km. When multiplied by
p, their combination is 5.729 km. Add the uncertainty due to coordinate
imprecision to this value to get the net uncertainty. Suppose the minutes are
marked on the margin of the map and we interpolated to get coordinates to the
nearest tenth of a minute. The coordinate precision is 0.1 minutes and the
uncertainty is 0.239 km from this source, therefore the maximum error distance is
5.769z0.239~5.968 km.
3.4.2. Calculating combined distance and direction uncertainties
The distance uncertainties in a given direction are linear and additive, but their
sum contributes non-linearly to the uncertainty arising from directional impreci-
sion. An additional technique is required to account for the correlation between
these two types of imprecision.
Figure 3. Uncertainty due to distance imprecision for two orthogonal offsets from the
centre of a named place.
Point-radius method for georeferencing 759
Example: ‘9 km NE of Bakersfield
Without considering distance precision, the directional uncertainty (figure 5) is
encompassed by an arc centred (at the coordinates x,y)10km(d) from the centre of
Bakersfield at a heading of 45 degrees (h), extending 22.5 degrees in either direction
Figure 4. Uncertainty due to the combination of distance imprecision and the extent of a
named place.
Figure 5. Uncertainty (e) due to direction imprecision for a direction specified as northeast
(NE). The actual direction could be anywhere between ENE and NNE; erepresents
the maximum distance by which the actual locality could vary from reported locality.
760 J. Wieczorek et al.
from that point. At this scale the distance (e) from the centre of the arc to the
furthest extent of the arc (at x,y) at a heading of 22.5 degrees (h) from the centre
of Bakersfield is given by Equation 6.
where x~dcos(h), y~dsin(h), x~dcos(h), and y~dsin(h). For the example
above, the uncertainty (e) due to the direction imprecision is 3.512 km.
Now consider the distance uncertainties in this example. Suppose the
contributions to distance uncertainty are 3 km (extent of Bakersfield), 1 km
(distance precision for ‘9 km’), 0.079 km (unknown datum), and 0.040 km (gazetteer
data are recorded to the nearest second) for a sum of 4.119 km. The shape of the
region describing the combination of distance and direction uncertainties will be a
band twice this width (2 64.119~8.238 km) centred (at the coordinates x,y)onan
arc offset from the origin by 9 km, spanning 22.5 degrees on either side of the NE
heading (figure 6). Uncertainty is still calculated with Equation 6, but now
x~(dzd) cos(h), and y~(dzd) sin(h), where dis the sum of the distance
The geometry can be generalized and simplified, by rotating the image in figure 6
so that the point (x,y) is on the xaxis (figure 7). After rotation, Equation 6 still
holds, but now x~dcos(a), y~dsin(a), x~dzd, and y~0, where dis still the
sum of the distance uncertainties and a is an angle equal to the magnitude of the
direction uncertainty. For the example above, the distance uncertainty is 4.119 km
and the direction uncertainty is 22.5 degrees. Given these values, the maximum
error distance is 5.918 km.
Figure 6. Uncertainty (e) due to the combination of distance imprecision (d) and direction
imprecision (h) for a locality specifying an offset (d) northeast (NE) of the centre of
a named place. The actual locality could be anywhere between ENE and NNE and
up to a distance deither side of the offset d.
Point-radius method for georeferencing 761
3.5. Step five: calculating overall error
Thapa and Bossler (1992) distinguish between primary and secondary data
collection. Primary data are taken directly from the field (ground surveying,
remotely sensed imagery, GPS readings). Secondary data are derived from existing
documents (maps, charts, graphs, gazetteers). Errors in secondary data consist not
only of those introduced in primary data collection (such as human and
instrumental errors), but also of those introduced from secondary data collection
(such as errors due to map inaccuracy). The post facto process of georeferencing
specimen locality descriptions relies heavily on secondary data. Thapa and Bossler
(1992) conclude that it is difficult, if not impossible, to calculate the total error
introduced by secondary data collection, because the functional relationships
among the various sources of error are unknown. They assume a linear relationship
between the total error and individual errors (e
, typically Root Mean Square
[RMS] is used), and apply the law of error propagation (Equation 7).
Total error~ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where e
is the standard error for source of error n.
There are a number of ambiguities that arise in locality descriptions to which
root mean square errors and the law of error propagation cannot be readily
applied. For example, how does one find the RMS error in the interpretation of
‘‘west’’? In addition, many of our individual error components, such as the error
from having an unknown datum, do not have a Normal distribution. For these
reasons, we have calculated maximum potential errors. The error propagation law
does not apply to this type of error. Instead, we calculate total error as the sum of
individual error components (Equation 8), and not as the square root of the sum of
the squared errors (Equation 7; which would always leads to a lower estimate than
Equation 8).
Maximum uncertainty~XuizXudð8Þ
Figure 7. Uncertainty diagram rotated to simplify the equation for the net uncertainty (e)–
the combination of distance and direction uncertainties.
762 J. Wieczorek et al.
where uis the maximum uncertainty for independent (i) or dependent (d) sources of
Like Thapa and Bossler (1992), we assume a linear relationship between total
error and individual errors for which there is no known functional relationship
(all ‘independent’ uncertainties u
). Uncertainties that do have known relationships
(all ‘dependent’ uncertainties u
; for example, uncertainty due to distance and
directional imprecision) are combined first on the basis of their relationships and
are then combined linearly to achieve the overall maximum uncertainty.
3.6. Step six: document the georeferencing process
When georeferencing a locality description, it is important to document the
process by which the data were determined and record this information with each
locality record so that anyone who encounters the data will benefit from the effort
expended in providing a high-quality georeference. We recommend that the list of
attributes recorded for each georeferenced locality include decimal latitude, decimal
longitude, horizontal datum, net uncertainty (distance and units), original
coordinate system, name of the person, organization, or software version that
georeferenced the locality, georeferencing date, references used, reason if not
georeferenced, named place, extent of the named place, determination method (for
example, the point-radius method), verification status, and the assumptions made.
With completely documented georeferenced localities, researchers who use the data
can quickly verify that the georeferencing was done correctly.
4. Discussion
The point-radius method described here was developed to meet the georeferen-
cing challenges of the MaNIS project, in which more than 40 individuals have used
these methods in a collaborative georeferencing effort covering locality descriptions
from all over the world. Localities were grouped by geographic region for the
MaNIS project, with each participating institution georeferencing all of the
localities within a given region for all participating institutions. A Java applet to
calculate coordinates and uncertainties (figure 8) for the point-radius method was
created by the first author and is freely available for use in Internet web browsers
(Wieczorek 2001). Uncertainty calculations using this tool are simple, fast, and
yield consistent results. Georeferencing rates for geographic regions varied,
depending heavily on the resources that were available to the georeferencers.
Where digital maps were available for a geographic region, the mean (¡1 SD)
georeferencing rate was 16.6 (¡8.3) localities per hour (n~14 data sets from 14
institutions). The mean georeferencing rate for regions where printed maps were
used instead of digital media was 9.6 (¡6.8) localities per hour (n~39 data sets
from four institutions). These rates include the determinations of both coordinates
and uncertainties, with full documentation as recommended in section 3.5.
The georeferencing rates reported for MaNIS include only those data sets that
were georeferenced manually, without the benefit of automated techniques.
Preliminary tests suggest that the efficiency of georeferencing can be increased
through automation, but that the resulting georeferences need to go through an
extra verification step to ensure that the interpretation of the descriptive locality
was made correctly. Even without automation, systematic error checking is
necessary to find inaccurate locality descriptions or incorrectly georeferenced
Point-radius method for georeferencing 763
localities. Some errors can be exposed by analyses that include complementary data
sets. One test for georeferenced localities is to determine if the coordinates for the
locality lie within the correct administrative boundaries, such as a country or lower
level geographic unit (Hijmans et al. 1999). A more interesting test can be made by
combining locality data for a given species with environmental data for those
localities to reveal ecological ‘outliers’ that may have resulted from inaccuracies in
the locality description or from the misidentification of the specimen. Another
example is to plot the collecting events of an expedition in temporal order; localities
that lie outside of the normal patterns in the expedition may be in error. These
examples illustrate that GIS can be used post-hoc to improve the quality of the
original data as well as to validate georeferences.
We have identified individual sources of error associated with the coordinates of
a point that represents a collection locality, and we have provided methods for
quantifying these individual error components in terms of maximum potential
error. We suggest summing the individual maximum error components because
commonly used alternative approaches, such as the law of error propagation, do
not readily apply.
Without baseline test data, it is also difficult to produce error descriptions using
alternative, fuzzy models (Altman 1994; Cross and Firat 2000), because this
approach also relies on functions to describe error distributions. However, the law
of error propagation, using standard errors, as well as fuzzy methods would be
useful for determining error contributions for different coordinate sources (maps,
gazetteer, and GIS layers, for example) where test data are available. These
methods could even prove viable under limited circumstances for the much more
difficult case of georeferencing locality descriptions. Appropriate error functions
Figure 8. Screen shot of the Georeferencing Calculator after coordinates and uncertainty
for a locality comprised of an offset at a heading have been calculated.
764 J. Wieczorek et al.
would have to be built from sets of locality descriptions for which the true localities
were known. These functions might then be applied to localities of similar syntax.
Nevertheless, the one goal of this study is to provide an effective means to filter
individual records based on the upper bound of the combination of all uncertainties
inherent in the assignment of coordinates to a place with a spatial extent. By careful
specification of the assumptions and of the techniques for combining uncertainties,
we present a simple, practical method for computing and recording geographic
coordinates and assigning this ‘‘maximum’’ uncertainty to each individual locality
The methods in this study provide an effective means to filter individual records
based on the upper bound of the combination of all uncertainties inherent in the
assignment of coordinates to a place with a spatial extent. In addition, more
elaborate methods could be developed to use the uncertainty associated with a
georeference in analysis, using fuzzy logic or other approaches (Burrough and
McDonnel, 1998). Every georeference is a hypothesis. Before georeferenced data are
used in analyses, every effort should be made to ensure that the locality description
accurately describes the place where the specimen was collected. This is particularly
true of localities reported with coordinates; even though the coordinates may
accurately refer to a specific location such as beginning of a trap line, the specimens
may have been collected over a considerably greater area. Collectors should also be
aware of this problem and annotate their localities to avoid underestimations of the
extent of the locality.
5. Summary
The point-radius method provides a practical solution for georeferencing
descriptive localities that can be widely implemented, especially in communities
where sophisticated GIS expertise is lacking. By accounting for the size of the
locality, the point-radius method provides a more accurate description of a locality
than is possible with the point method. By providing a single measure of the
combination of uncertainties inherent in the locality description, the applicability of
a locality for a given analysis can be more readily discerned than with the bounding
box method. By capturing the spatial attributes of the locality in a simple,
consistent set of parameters, the point-radius method offers a solution that is
practical for natural history collections without the need for spatial databases that
would be necessary to store georeferences created using the shape method.
Checking for and correcting errors can be time consuming. With a well-defined
georeferencing method, appropriate tools, and proper documentation of the
resulting data, the number of errors will be minimized and the results of effort
expended to georeference the locality will be available in perpetuity.
The authors would like to thank Stan Blum, Elizabeth Proctor, and George
Chaplin for their early inspiration to develop rigorous georeferencing methods.
Larry Speers encouraged us to develop and document methods that are practical
for natural history collections. Gary Shugart and Reed Beaman have provided
critical analysis of the point-radius method and have investigated means to
automate the process. Eileen Lacey provided useful discussion and criticism. Craig
Wieczorek provided programming assistance. We extend special thanks to Barbara
Point-radius method for georeferencing 765
Stein and the numerous participants in the MaNIS Project, without whose practical
feedback and encouragement these methods would not likely have been elaborated.
Funding leading to this publication was generously provided by the National
Science Foundation (DBI #0108161) and the UC Berkeley Museum of Vertebrate
ADL (Alexandria Digital Library), 2001, Alexandria Digital Library Gazetteer Server (http://
ALTMAN, D., 1994, Fuzzy set theoretic approaches for handling imprecision in spatial
analysis. International Journal of Geographical Information Systems,8, 271–289.
2003, Positional accuracy of geocoded addresses in epidemiologic research.
Epidemiology,14, 408–412.
BURROUGH, P. A., and MCDONNEL, R. A., 1998, Fuzzy sets and fuzzy geographical objects.
In Principles of Geographical Information Systems (Oxford, U.K.: Oxford University
Press), pp. 265–291.
CROSS, V., and FIRAT, A., 2000, Fuzzy objects for geographical information systems. Fuzzy
Sets and Systems,113, 19–36.
DRUMMOND, J., 1990, A framework for handling error in Geographic Data manipulation.
In Fundamentals of Geographic Information Systems: A Compendium, ASPRS,
pp. 109–118.
DEFENSE MAPPING AGENCY, 1991, Department of Defense World Geodetic System 1984, Its
Definition and Relationships with Local Geodetic Systems (2nd edition), DMA
Technical Report 8350.2, Defense Mapping Agency, Fairfax, Virginia.
DUCKWORTH, W. D., GENOWAYS, H. H., and ROSE, C. L., 1993, Preserving natural science
collections: chronicle of our environmental heritage. National Institute for the
Conservation of Cultural Property, Washington, D.C.
FGDC (Federal Geographic Data Committee), 1998, Geospatial positioning accuracy
standards. Part 3. National standard for spatial data accuracy. Federal Geographic
Data Committee, FGDC-STD-007.3-1998, Virginia, USA.
FISHER, P. F., 1999, Models of Uncertainty in Spatial Data. In Geographical Information
Systems, edited by P. A. Longley, M. F. Goodchild, D. J. Maguire and D. W. Rhind
(New York: John Wiley & Sons), pp.191–205.
GBIF (Global Biodiversity Information Facility), 2002, Draft Report of the Meeting of the
Digitization of Natural History Collections Scientific and Technical Advisory Group of
the Global Biodiversity Information Facility. (Kopenhagen: GBIF).
GOODCHILD, M. F., and HUNTER, G. J., 1997, A simple positional accuracy measure for
linear features. International Journal of Geographical Information Science,11,
HIJMANS, R. J., SCHREUDER, M., DELACRUZ, J., and GUARINO, L., 1999, Using GIS to
check co-ordinates of germplasm accessions. Genetic Resources and Crop Evolution,
46, 291–296.
KNYAZHNITSKIY, O. V., MONK, R. R., PARKER, N. C., and BAKER, R. J., 2000, Assignment
of global information system coordinates to classical museum localities for relational
database analyses. Occasional Papers, Museum of Texas Tech University,199, 1–15.
KRISHTALKA, L., and HUMPHREY, P. S., 2000, Can natural history museums capture the
future? BioScience,50, 611–617.
KU-BRC (University of Kansas Biodiversity Research Centre), 2002, Lifemapper (http://
LEUNG, Y., and YAN, J. P., 1998, A locational error model for spatial features. International
Journal of Geographical Information Science,12, 607–620.
MaNIS (Mammal Networked Information System), 2001, (
S. L., and WOODWARD, S. M., 1999, Documentation standards for automatic data
766 J. Wieczorek et al.
processing in mammalogy, Version 2. Committee on Information Retrieval, American
Society of Mammalogists.
NIMA (United States National Imagery and Mapping Agency), 2000, Department of
Defense World Geodetic System 1984. Its Definition and Relationships with Local
Geodetic Systems. TR8350.2, Third Edition, (Bethesda, Maryland: NIMA).
STANISLAWSKI, L. V., DEWITT, B. A., and SHRESTHA, R. L., 1996, Estimating positional
accuracy of data layers within a GIS through error propagation. Photogrammetric
Engineering and Remote Sensing,62, 429–433.
THAPA, K., and BOSSLER, J., 1992, Accuracy of Spatial Data Used in Geographic
Information-Systems. Photogrammetric Engineering and Remote Sensing,58, 835–841.
USGS (United States Geological Survey), 1981, Geographic Names Information System.
USGS (United States Geological Survey), 1999, National Mapping Program Technical
Instructions. Part 2. Specifications. Standards for Digital Line Graphs. (Reston,
Virginia: USGS).
VAN NIEL, T. G., and MCVICAR, T. R., 2002, Experimental evaluation of positional accuracy
estimates from a linear network using point- and line-based testing methods.
International Journal of Geographical Information Science,16, 455–473.
VEREGIN, H., 2000, Quantifying positional error induced by line simplification. International
Journal of Geographical Information Science,14, 113–130.
WELCH, R., and HOMSEY, A., 1997, Datum shifts for UTM coordinates. Photogrammetic
Engineering and Remote Sensing,63, 371–375.
WIECZOREK, J. R., 2001, Georeferencing Calculator (
Point-radius method for georeferencing 767
... On the other hand, exact georeferencing can be done using different methodologies (Conolly and Lake 2006;Wieczorek et al. 2004). The most common approach between the 15th and 20th centuries was based on the use of maps or charts that are two-dimensional representations of territory. ...
... According to Wieczorek et al. (2004), there are four types of georeferencing: the point method, the polygon method, the bounding box method, and the point-radius method. Each has its disadvantages, which in general are summarised in the capacity of each one of them to best determine the location of an observation within a locality. ...
... Owing to the scale used in the original project (1:250000), we decided to use the first methodology, which involves assigning a pair of coordinates to each location. According to Wieczorek et al. (2004), one of the biggest disadvantages of this method is that, generally, a qualitative description describes an area and not a point on the ground. So if a point is provided for a georeferenced record, the distinction between specific and non-specific localities is lost. ...
BaDACor is a database that contains a comprehensive inventory of archaeological sites located in the province of Córdoba, Argentina. The creation of this database was the result of a top-down approach, which involved the collaboration of decision-makers and professionals from the academic and state-governmental sectors. Furthermore, the database has also been utilised in a bottom-up approach, whereby interest groups and citizens concerned with heritage preservation have made use of it. This has been particularly important in light of the construction of Highway 38, which has resulted in damage to natural habitats and the destruction of territories of communities with traditional ways of life. Additionally, the construction of the highway has also endangered the integrity of ancestral territories loaded with symbolism for aboriginal communities. BaDACor has been employed in legal claims in cases of conflict with the state, and has proved to be an invaluable tool for heritage management. This is especially significant for local communities and indigenous groups who have historically had their heritage desecrated, destroyed, and hidden. The availability of BaDACor on different platforms has facilitated better access to information while also ensuring the preservation of digital data. The use of digital media has been reinforced through talks, conferences, and meetings with stakeholders to ensure that the voices of affected communities are heard in decision-making processes.
... Old datasets, such as historical observations archived in museums, atlases and natural history collections that were retrospectively georeferenced, are usually thought to be more prone to relatively higher positional error than new ones (Graham et al. 2004;Wieczorek et al. 2004;Newbold 2010;Bloom et al. 2018, Marcer et al. 2022). However, positional error affects any dataset, including those georeferenced using modern technologies such as the global navigation satellite systems (GNSS). ...
Species distribution models (SDMs) have proven valuable in filling gaps in our knowledge of species occurrences. However, despite their broad applicability, SDMs exhibit critical shortcomings due to limitations in species occurrence data. These limitations include, in particular, issues related to sample size, positional error, and sampling bias. In addition, it is widely recognized that the quality of SDMs as well as the approaches used to mitigate the impact of the aforementioned data limitations are dependent on species ecology. While numerous studies have experimentally evaluated the effects of these data limitations on SDM performance, a synthesis of their results is lacking. However, without a comprehensive understanding of their individual and combined effects, our ability to predict the influence of these issues on the quality of modelled species-environment associations remains largely uncertain, limiting the value of model outputs. In this paper, we review studies that have evaluated the effects of sample size, positional error, sampling bias, and species ecology on SDMs outputs. We integrate their findings into a step-by-step guide for critical assessment of spatial data intended for use in SDMs.
... Leaflet package (Graul, 2016) and the R software environment V. 3.6 (R Core Team, 2021) were used to validate each record to ensure that each point was accurately situated on the polygon of the stated position. The circle of maximum radius determined by the extension of the polygon of the stated position was used to calculate the coordinate uncertainty of records that had not been reported (Wieczorek et al., 2004). ...
... material 1). Presence occurrences without geographic coordinates were georeferenced using the radius-point method (Wieczorek et al. 2004;Escobar et al. 2016) in the GOOGLE EARTH (v., Google Inc.) software (Suppl. ...
Full-text available
The biodiversity of molluscs is highly threatened in marine, terrestrial and freshwater ecosystems worldwide. This research aimed at studying the distribution and conservation status of eight poorly-known micro-snails of the genera Stephacharopa and Stephadiscus in Chile. We performed a comprehensive review of literature and databases to determine the occurrences of the species, which were mapped on vector layers containing protected areas and human development infrastructure to find potential threats. Conservation status assessment was performed following the criteria and tools implemented by the International Union for the Conservation of Nature (IUCN) Red List and NatureServe. We also conducted species distribution models, based on maximum entropy, to identify areas that should be prioritised for conservation. Two species meet the criteria for IUCN listing as Critically Endangered (CR), four Endangered (EN), one Vulnerable (Vu) and one Least Concern (LC). This classification is rather coincident with equivalent categories obtained under the NatureServe standard, in which two species were ranked as Critically Imperiled (N1), five Imperiled (N2) and one Vulnerable (N3). We found that Stephacharopa paposensis is the most at-risk species, with only one occurrence not included in a protected area, followed by Stephadiscus stuardoi , with two occurrences, one of them within a protected area. Stephadiscus lyratus was the species with the greatest geographic range, accounting for 17 occurrences, seven matching a protected area. We found wider potential ranges in modelled species that may be useful for prioritising conservation measures. Considering distributional data, protected areas and more than 20 plausible threats identified, we propose potential in situ and ex situ conservation actions to protect these neglected micro-snails.
... I cambiamenti principali sono associati all'aumento della disponibilità degli strumenti portatili con fotocamera digitale e rilevatore GPS (DICKINSON et al., 2010(DICKINSON et al., , 2012SUPRAYITNO et al., 2017). In particolare, la diffusione della fotografia digitale ha catalizzato la raccolta di documentazione per le osservazioni sul campo, mentre la tecnologia GPS ha introdotto maggiore immediatezza e precisione nella "georeferenziazione", ossia nel processo di determinazione delle coordinate in un sistema di riferimento geografico, solitamente coordinate geografiche (latitudine e longitudine) oppure coordinate chilometriche (UTM) (WIECZOREK et al., 2004). Precedentemente, la georeferenziazione doveva basarsi sulla lettura delle coordinate su una mappa oppure più spesso veniva approssimata riferendola a reticoli cartografici, talvolta a posteriori sulla base di indicazioni di località (toponimi). ...
Full-text available
Riassunto. È stato fatto un censimento dei principali database digitali esistenti contenenti record distributivi delle specie di tetrapodi nel Veneto. Per ogni database sono state analizzate diverse proprietà tra cui la copertura tassonomica, quella temporale e quella spaziale, l'abbondanza di record, la consultabilità e l'utilizzabilità. I database sono stati quindi valutati secondo il loro valore informativo e la loro accessibilità. Dal censimento, aggiornato al 2018, sono risultati 64 database contenenti record di più specie e relativi a territori di almeno 1 km 2. La maggior parte dei database contiene record di uccelli e tra quelli di maggior valore informativo vi sono: i database nazionali di SHI (anfibi e rettili), CISO (uccelli) e ATIt (mammiferi) sulla piattaforma; il database di AsFaVe del Nuovo Atlante dei Mammiferi del Veneto, i database "Italian Herps" e "Birds of the World" sulla piattaforma iNaturalist; il database degli uccelli del progetto MITO2000; il database composito di Venezia Birdwatching e Verona Birdwatching. Molti database sono stati realizzati da associazioni, regionali o nazionali. Summary. Digital databases of distributional records of the tetrapod species in the Veneto region (North-East Italy). We surveyed the main existing digital databases containing distributional records on the species of Tetrapoda in the Veneto region (North-East Italy). We analyzed several properties for each database, including taxonomic, temporal and geographic scope, record abundance, availability for consultation and usage. We also assessed the databases on the basis of their informative value and their accessibility. We found 64 databases containing records of multiple species from areas of at least 1 km 2. Most of the databases contains records of birds. Among the databases with the highest informative value, there are: the Italian databases of SHI (for amphibians and reptiles), CISO (birds) and ATit (mammals) on; the database of AsFaVe for the Mammal Atlas of the Veneto region; the iNaturalist databases "Italian Herps" and "Birds of the World"; the database of the MITO2000 monitoring scheme; the combined database of Venezia Birdwatching and Verona Birdwatching. Many databases have been produced by regional or national associations.
... This information is often dispersed among different sources or missing altogether. Most missing data were coordinates; therefore, if the locality description was detailed enough, we approximated them (Supporting Information, Fig. S1) using the point method (Wieczorek et al. 2004). ...
Macroevolutionary analyses can identify patterns associated with the origin and diversification of species. Here, we gathered currently available genetic and morphological information to explore the diversification dynamics in a highly diverse family of squamate lizards, Gymnophthalmidae. We downloaded the available GenBank data for four genetic markers (12S, 16S, ND4, and c-mos) and generated a dated phylogenetic hypothesis to use as an operational framework. Using our time-calibrated tree, we conducted a Bayesian analysis of macroevolutionary mixtures (BAMM) to explore macroevolutionary patterns for both speciation and phenotypic evolution. For the latter, we included two morphological traits: body size and leg development. We recovered six major clades commonly referred to as subfamilies, whose common ancestor was recovered between 78.15 and 81.68 Myr. Additionally, we found that the major accumulation of extant lineages occurred during the Miocene. Overall, all the evolutionary rates tended to be low, with particular clades exhibiting higher rates and different but not congruent points of acceleration along the tree. These findings indicate that speciation and phenotypic evolution in this lizard group are heterogenic and decoupled, but several pulses of diversification have occurred. Geographically, we found older lineages to be concentrated in the Amazon and the Guiana Shield, whereas higher speciation rates are found in the tropical Andes and its adjacent lowlands.
... Only one tenth of the literature occurrences contained the geographic coordinates of locations provided by the authors. The remaining occurrences were manually georeferenced using the point-radius method (Wieczorek et al. 2010). ...
Full-text available
Data availability for certain groups of organisms (ecosystem engineers, invasive or protected species, etc.) is important for monitoring and making predictions in changing environments. One of the most promising directions for research on the impact of changes is species distribution modelling. Such technologies are highly dependent on occurrence data of high quality (Van Eupen et al. 2021). Earthworms (order Crassiclitellata) are a key group of organisms (Lavelle 2014), but their distribution around the globe is underrepresented in digital resources. Dozens of earthworm species, both widespread and endemic, inhabit the territory of Northern Eurasia (Perel 1979), but extremely poor data on them is available through global biodiversity repositories (Cameron 2018). There are two main obstacles to data mobilisation. Firstly, studies of the diversity of earthworms in Northen Eurasia have a long history (since the end of the nineteenth century) and were conducted by several generations of Soviet and Russian researchers. Most of the collected data have been published in "grey literature", now stored only in a few libraries. Until recently, most of these remained largely undigitised, and some are probably irretrievably lost. The second problem is the difference in the taxonomic checklists used by Soviet and European researchers. Not all species and synonyms are included in the GBIF (Global Biodiversity Information Facility) Backbone Taxonomy. As a result, existing earthworm species distribution models (Phillips 2019) potentially miss a significant amount of data and may underestimate biodiversity, and predict distributions inaccurately. To fill this gap, we collected occurrence data from the Russian language literature (published by Soviet and Russian researchers) and digitised species checklists, keeping the original scientific names. To find relevant literature, we conducted a keyword search for "earthworms" and "Lumbricidae" through the Russian national scientific online library eLibrary and screened reference lists from the monographs of leading Soviet and Russian soil zoologist Tamara Perel (Vsevolodova-Perel 1997, Perel 1979). As a result, about 1,000 references were collected, of which 330 papers had titles indicating the potential to contain data on earthworm occurrences. Among these, 219 were found as PDF files or printed papers. For dataset compilation, 159 papers were used; the others had no exact location data or duplicated data contained in other papers. Most of the sources were peer-reviewed articles (Table 1). A reference list is available through Zenodo (Ivanova et al. 2023). The earliest publication we could find dates back to 1899, by Wilhelm Michaelsen. The most recent publication is 2023. About a third of the sources were written by systematists Iosif Malevich and Tamara Perel. Occurrence data were extracted and structured according to the Darwin Core standard (Wieczorek et al. 2012). During the data digitisation process, we tried to include as much primary information as possible. Only one tenth of the literature occurrences contained the geographic coordinates of locations provided by the authors. The remaining occurrences were manually georeferenced using the point-radius method (Wieczorek et al. 2010). The resulting occurrence dataset Earthworm occurrences from Russian-language literature (Shashkov et al. 2023) was published through the Global Biodiversity Information Facility portal. It contains 5304 occurrences of 117 species from 27 countries (Fig. 1). To improve the GBIF Backbone Taxonomy, we digitised two catalogues of earthworm species published for the USSR (Perel 1979) and Russian Federation (Vsevolodova-Perel 1997) by Tamara Perel. Based on these monographs, three checklist datasets were published through GBIF (Shashkov 2023b, 124 records; Shashkov 2023c, 87 records; Shashkov 2023a, 95 records). Now we work towards including these names in the GBIF Backbone so that all species names can be matched and recorded exactly as mentioned in papers published by Soviet and Russian researchers.
... 7.3.6). Uncertainty of the data (in metres) was indicated according to the point-radius method (Wieczorek et al. 2004). Discussion: The specimen has a dense pubescence and numerous long, erect hairs on the pronotum (clearly visible in lateral view) and on the scutellum and anterior portion of the hemelytra. ...
Linking historical and contemporary geographic information in biodiversity data is a useful approach to approximate species population. However, one of the prominent factors that causes ambiguity in geographic information, and hinders the linking process, is the way sovereignty information is used. While historical biodiversity records often use sovereignties as proxies for geographic information about a species, contemporary records do not. This study proposes a conceptual model that incorporates sovereignty information in biodiversity data to foster the linkage between historical and contemporary geographical information. The model comprises two phases: the first phase relates tangible data sources and core components needed to construct historical sovereignty taxonomies; and the second phase is a process model to align historical sovereignty taxonomies with contemporary taxonomies in four phases. The output of the model presents all possible sovereignties that a geographic entity belongs to based on the degree of congruence between the historical and contemporary taxonomies. The contributions of this work are threefold: (1) making all possible ambiguities in historical geographic information explicit in biodiversity data; (2) bringing attention to the modeling choices that domain experts have to make when deciding which sovereignty a place name belongs to; and (3) extending and improving current geo‐referencing practices.
Full-text available
This report recommends action in the areas outlined below. Strategies for implementation of each recommendation are presented in chapter three, "Meeting the Challenge: Recommendations and Strategies." Stewardship of collections Public awareness Staffing, education, and training Technology transfer Conservation research Guidelines and standards of practice
The USGS 1 :24,000-scale topographic maps and associated digital map products of the United States are cast on the North American Datum of 1927 (NAD 27). These map prod-ucts are a national asset used for a variety of mapping, GIS database construction, and land survey tasks. However, NAD 27 has been replaced by the North American Datum of 1983 (NAD 83). While shifts to translate the latitude/longitude (lat/ long) graticule coordinates to NAD 83 are documented, n o in-formation is readily available on the shifts i n metres needed to convert NAD 27 UTM Northing and Easting grid coordinates to NAD 83 values. These shifts m a y be determined with com-puter software such as the U.S. Army Topographic Engineer-ing Center (TEC) CORPSCON package or the commercially available Blue Marble Geographics Geographic Calculator program, and, when plotted at 2" intervals (1atNong) for the contiguous 48 states, show a remarkable consistency within the 6-degree-wide UTM zones, changing gradually from south to north. The shifts depicted in the graphical plots provide the map user with the values needed to quickly convert NAD 27 UTM grid coordinates to NAD 83 values. Because rectangu-lar grid coordinates are preferred for a majority of tasks, it is recommended that the national mapping agencies determine the shift values to convert the NAD 27 UTM coordinates of in-dividual map sheet corners to the datum of choice and make them available through publications and the World Wide Web. When map products are revised, notes defining shifts i n rectangular grid coordinates should be included on the map collars or appended to the digital files.
The positional accuracy of a GIS layer can be separated into absolute and relative components. Accepted standards for estimating horizontal accuracy in cartographic data quantify absolute positional accuracy only. However, relative accuracy values that describe variability in spatial relationships of coordinate information--such as variance of area, azimuth, and distance computations--can be valuable to research and decision making. This paper presents a technique for quantifying absolute and relative positional accuracy estimated through error propagation from a co-variance matrix for affine transformation parameters. This technique was developed and tested with a spatial data set manually digitized from a simulated 1:24,000-scale map whose errors were restricted to those of the electrostatic plotter. A sequence of transformation tests was performed, using from 4 to 40 control points per test. Estimates for combined error associated with electrostatic plotting and manual point-mode digitizing were inversely related to the number of control points up to about 20. Semi-major axes for point certainty regions at u 39.4 percent confidence level ranged from 1.86 to 5.45 meters 10.0775 to 0.227 mm at map scale).
In this paper, a comprehensive outline of the different types of errors encoutered in the process of data collection is presented. An overview of different errors encountered in the "primary and secondary' methods of data collection is explained. In addition, a brief summary of different standards and specifications used in the primary methods of data collection is provided. Finally, a comparison between the primary and the secondary methods of data collection is made. -from Authors
This paper presents methodologies for modelling imprecision in the definition, analysis and synthesis of two-dimensional features. The imprecision may arise through incomplete information, the presence of varying concentrations of attributes, or the use of qualitative descriptions of spatial features or their relationships. The work is intended to have applications in geographical information systems (GIS), but is equally applicable to other types of spatial information systems or spatial database applications. Fuzzy sets are used as a representational and reasoning device. The paper contains definitions of an imprecisely defined spatial feature or fuzzy region; definitions of distance and directional metrics between two such regions; a methodology for analysis of the spatial relationship between two regions; and a methodology for synthesis of new regions that are subject to the presence of imprecise spatial constraints.
A locational error model for spatial features in vector-based geographical information systems (GIS) is proposed in this paper. Using error in points as the fundamental building block, a stochastic model is constructed to analyse point, line, and polygon errors within a unified framework, a departure from current practices which treat errors in point and line separately. The proposed model gives, as a special case, the epsilon band model a true probabilistic meaning. Moreover, the model can also be employed to derive accuracy standards and cartographic estimates in GIS.