Learning to resolve geographical and temporal references in text.
DOI: 10.1145/2093973.2094020 Conference: 19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, November 1-4, 2011, Chicago, IL, USA, Proceedings
Geo-temporal information is pervasive over textual documents, since most of them contain references to particular locations, calendar dates, clock times or duration periods. An important text analytics problem is therefore related to resolving the place names and the temporal expressions referenced in the texts, i.e. linking the character strings in the documents that correspond to either locations or temporal instances, to the specific geospatial coordinates or the time intervals that they refer to. However, geo-temporal reference resolution presents several non-trivial problems to the area of text mining, due to the inherent ambiguity and contextual assumptions of natural language discourse.
[Show abstract] [Hide abstract]
- "There is a fair body of research on georeferencing both outside and inside the domain of natural history. Within the natural language processing community georeferencing is treated as a follow-up task to named entity recognition (Leidner and Lieberman 2011; Loureiro et al. 2011), or possibly as complementary to it (Godoy et al. 2011). There are also several open source tools available such as OpenSextant (http://opensextant.github.io) "
ABSTRACT: For biodiversity research, the field of study that is concerned with the richness of species of our planet, it is of the utmost importance that the location of an animal specimen find is known with high precision. Due to specimens often having been collected over the course of many years, their accompanying geographical data is often ambiguous or may be very imprecise. In this article, we detail an approach that utilizes reasoning and external sources to improve the geographical information of animal finds. Our main contribution is to show that adding external domain knowledge improves the ability to georeference locations over traditional methods that focus solely on analyzing geographical information. Additionally, our system is able to output the confidence it has in its decisions through a confidence measure based on the difficulty of the instance and the steps undertaken to disambiguate it. Our results show that adding domain knowledge to the georeferencing process increases the accuracy @5km from 38.9% to 61.7% and from 47.0% to 74.5% @25km. Furthermore, we reduce the mean distance by more than half, from 251.1km to 114.5km, and decrease the number of records for which no reference can be found from 26.2% to 7.4%.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.