Conference PaperPDF Available

ChatGPT as a mapping assistant: A novel method to enrich maps with generative AI and content derived from street-level photographs

Authors:

Abstract and Figures

This paper explores the concept of leveraging generative AI as a mapping assistant for enhancing the efficiency of collaborative mapping. We present the results of an experiment that combines multiple sources of volunteered geographic information (VGI) and large language models (LLMs). Three analysts described the content of crowdsourced Mapillary street-level photographs taken along roads in a small test area in Miami, Florida. GPT-3.5-turbo was instructed to suggest the most appropriate tagging for each road in OpenStreetMap (OSM). The study also explores the utilization of BLIP-2, a state-of-the-art multimodal pre-training method as an artificial analyst of street-level photographs in addition to human analysts. Results demonstrate two ways to effectively increase the accuracy of mapping suggestions without modifying the underlying AI models: by (1) providing a more detailed description of source photographs, and (2) combining prompt engineering with additional context (e.g. location and objects detected along a road). The first approach increases the accuracy of the suggestion by up to 29%, and the second one by up to 20%.
Content may be subject to copyright.
UC Santa Barbara
Spatial Data Science Symposium 2023 Short Paper Proceedings
Title
ChatGPT as a mapping assistant: A novel method to enrich maps with generative AI and
content derived from street-level photographs
Permalink
https://escholarship.org/uc/item/64h832hd
Authors
Juhász, Levente
Mooney, Peter
Hochmair, Hartwig H.
et al.
Publication Date
2023-09-05
DOI
10.25436/E2ZW27
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
ChatGPT as a mapping assistant: A novel
method to enrich maps with generative AI and
content derived from street-level photographs
Levente Juh´asz1[0000000333934021] , Peter Mooney2[0000000223893783] ,
Hartwig H. Hochmair3[0000000270648238], and Boyuan Guan1
1GIS Center, Florida International University, Miami, FL 33199, USA
{ljuhasz,bguan}@fiu.edu
2Department of Computer Science, Maynooth University, Co. Kildare, Ireland
peter.mooney@mu.ie
3Geomatics Sciences, University of Florida, Ft. Lauderdale, FL 33144, USA
hhhochmair@ufl.edu
Abstract. This paper explores the concept of leveraging generative AI
as a mapping assistant for enhancing the efficiency of collaborative map-
ping. We present the results of an experiment that combines multiple
sources of volunteered geographic information (VGI) and large language
models (LLMs). Three analysts described the content of crowdsourced
Mapillary street-level photographs taken along roads in a small test area
in Miami, Florida. GPT-3.5-turbo was instructed to suggest the most
appropriate tagging for each road in OpenStreetMap (OSM). The study
also explores the utilization of BLIP-2, a state-of-the-art multimodal
pre-training method as an artificial analyst of street-level photographs
in addition to human analysts. Results demonstrate two ways to effec-
tively increase the accuracy of mapping suggestions without modifying
the underlying AI models: by (1) providing a more detailed description
of source photographs, and (2) combining prompt engineering with ad-
ditional context (e.g. location and objects detected along a road). The
first approach increases the accuracy of the suggestion by up to 29%,
and the second one by up to 20%.
Keywords: ChatGPT ·OpenStreetMap ·Mapillary ·LLM ·volun-
teered geographic information ·mapping
DOI: 10.25436/E2ZW27
1 Introduction
Generative artificial intelligence (AI) is a type of AI that can produce various
types of content, including text, images, audio, code, and simulations. It has
gained enormous attention since the public release of ChatGPT in late 2022.
ChatGPT is an example of a Large Language Model (LLM), which is a form of
generative AI that produces human-like language. Since the launch of ChatGPT,
2 L. Juh´asz et al.
researchers, including the geographic information science (GIScience) commu-
nity, have been trying to understand the potential role of AI for research, teach-
ing, and applications. ChatGPT can be used extensively for Natural Language
Processing (NLP) tasks such as text generation, language translation, writing
software code, and generating answers to a plethora of questions, engendering
both positive and adverse impacts [2]. The emergence of generative AI has in-
troduced transformative opportunities for spatial data science. In this paper, we
explore the potential of generative AI to assist human cartographers and GIS
professionals in increasing the quality of maps, using OSM as a test case (Figure
1).
Fig. 1. OpenStreetMap roads and Mapillary images in the study area near Downtown
Miami
GeoAI has been part of the GIScience discourse in recent years. For exam-
ple, Janowicz et al. [7] elaborated on whether it was possible to develop an
artificial GIS analyst that passes a domain specific Turing test. Although these
questions are still largely unanswered, utilizing LLMs and foundational models
in geospatial contexts contributes to this direction. Despite the challenges due
to the different nature of LLM training methodologies and human learning of
spatial concepts [15], these tools and methods are being explored, for example
to generate maps [18]. Our study fits into this direction by using an LLM (Chat-
GPT) and a multimodal pre-training method (BLIP-2) to connect visual and
SDSS 2023 3
language information in the context of mapping. We explore the larger question
of whether generative AI is a useful tool in the context of creating and enriching
map databases and more specifically investigate the following research questions:
1. Is generative AI capable of turning natural language text descriptions into
the correct attribute tagging of road features in digital maps?
2. For this problem, can the accuracy of suggestions be improved through
prompt engineering [17]?
3. To what extent can the work of human analysts be substituted with gener-
ative AI approaches within these types of mapping processes?
Furthermore, our approach focuses on the fusion of freely available volun-
teered geographic information (VGI) [5] data sources (OSM, Mapillary) and
off-the-shelf AI tools to present a potentially low-cost and uniformly available
solution. OSM is a collaborative project that aims to create a freely accessible
worldwide map database (https://openstreetmap.org), while Mapillary crowd-
sources street-level photographs (https://mapillary.com) that power mapping
and other applications, such as object detection, semantic segmentation and
other computer vision algorithms to extract semantic information from imagery
[3]. While the use of VGI has not yet been explored in the context of genera-
tive AI, previous studies demonstrated the practicability of combining multiple
sources of VGI to improve the mapping process [13]. More specifically, Mapillary
street-level images are routinely used to enhance OSM [8,10].
2 Study setup
2.1 Data sources and preparation
In OSM, geographic features are annotated with key-value pairs to assign the
correct feature category to them, a process called tagging [14]. For example,
roads are assigned a "highway"=<value> tag where <value> indicates a specific
road category, such as "residential" for a residential street.
OSM data is not homogeneous, and individual users may perceive roads dif-
ferently and therefore assign different "highway" values to the same type of road.
A list of "highway" tag values was established to better describe the meaning
of each road category in OSM. Furthermore, the difference between some road
categories, for example, primary and ’secondary, is more of an administrative
nature rather than visual appearance. For example, a 2-lane road in rural ar-
eas could be considered primary, whereas a more heavily trafficked road in an
urban environment might be categorized secondary. To consider semantic road
categories rather than individual "highway" values as one of the evaluation
methods, "highway" tag values representing similar roads in our dataset were
grouped into four categories (Table 1).
Figure 2 shows the methodology to obtain OSM roads of interest with the cor-
responding Mapillary street-level images. First, all OSM roads with a "highway"=*
4 L. Juh´asz et al.
Table 1. Grouping distinct "highway" tag values into semantically similar categories.
Category name OSM "highway" # of roads
Major, access controlled road motorway|trunk 0
Main road primary|secondary|tertiary 81
Regular road residential|unclassified|service 4
Not for motorized traffic pedestrian|footway|cycleway 9
tags were extracted within the study area. Then, short sections (<50m), inac-
cessible roads, sidewalks along roadways and roads without street-level photo
coverage were excluded. Retained OSM roads were matched with corresponding
Mapillary photographs, so that each road segment would have at least one repre-
sentative Mapillary image. Lastly, a list of objects detected in the corresponding
image was also extracted from the Mapillary API. These inputs were further
used as described in Section 2.3.
Fig. 2. Workflow for preparing input from Mapillary
2.2 Resources
AI tools and models We utilize GPT-3.5-turbo, which is an advanced lan-
guage model developed by OpenAI. It is an upgraded version of GPT-3, designed
to offer improved performance and capabilities, and retains the large-scale ar-
chitecture of its predecessor, enabling it to generate coherent and contextually
relevant text [16]. GPT-3.5-turbo serves as a powerful tool for natural language
processing, content generation, and other language-related applications. In our
study it is used to suggest OSM tagging based on pre-constructed prompts using
the content of street-level images. The model was accessed through the OpenAI
API.
SDSS 2023 5
BLIP-2 [11] is a state-of-the-art, scalable multimodal pre-training method,
designed to equip LLMs with the capability to understand images while keeping
their parameters entirely frozen. This approach is built upon leveraging frozen
pre-trained unimodal models and a proposed Querying Transformer (Q-Former),
sequentially pre-trained for vision-language representation learning and vision-
to-language generative learning. Despite operating with fewer trainable parame-
ters, BLIP-2 has achieved exceptional performance in a variety of vision-language
tasks and has shown potential for zero-shot image-to-text generation [11]. The
methodology proposed in BLIP-2 contributes towards the development of an ad-
vanced multimodal conversational AI agent. We leverage BLIP-2’s capability to
generate image captions as well as to perform visual question-answering (Q&A).
A freely available sample implementation of BLIP-2 was used to conduct this
experiment (https://replicate.com/andreasjansson/blip-2).
Analysts Three analysts were tasked to describe the visual content of street-
level images (captioning), and to answer a few questions regarding the image
content (visual Q&A). Two (human) analysts were undergraduate students at
Florida International University with previous GIS coursework. BLIP-2 was used
to perform the same task as human analysts, and its responses were recorded as
the third (artificial) analyst. Analysts were deliberately not given any guidelines
as to how to describe images so that their answers would not be biased by prior
knowledge about OSM and mapping. Table 2 lists questions and tasks performed
by analysts.
Table 2. Questions and tasks performed by analysts.
Variable Question/task Example response
"caption" Describe what you see in the photo in your
own words.
A city road in an urban area
along an elevated railway.
There is a wide sidewalk on
both sides and trees on the
left.
"users" Who are the primary users of the road that
is located in the middle of the photograph?
Cars, pedestrians or bicyclists?
Cars
"lanes" How many traffic lanes are there on the
road that is in the middle of the photo-
graph?
3
"surface" What is the material of the surface of the
road that is in the center of the photograph
Asphalt
"oneway" Is the road that is in the center of the pho-
tograph one-way?
No
"lit" Are there any street lights in the photo-
graph?
Yes
6 L. Juh´asz et al.
The answers of analysts differ in level of detail. For example, BLIP-2’s and
Analyst #2’s captions were significantly shorter on average (9 and 11 words, re-
spectively) than Analyst #1’s (37 words). BLIP-2’s responses were also found to
be more generic (e.g. "a city street with tall buildings in the background")
than human analysts’. This allows us to explore the effect of providing increasing
detail on tag suggestion accuracy.
2.3 Methodology for suggesting OSM tags
Figure 3 shows the methodology for suggesting tags for an OSM road. For each
retained road in the area, the corresponding Mapillary images were shown to an-
alysts described in Section 2.2. Analysts created an image caption and answered
simple questions as described in Table 2. These responses in combination with
additional context were used to build prompts for an LLM to suggest OSM tags.
To explore what influences the accuracy of suggested tags, a series of prompts
were developed that differ in the level of detail that is presented to the LLM.
All prompts start with the following message that provides context and in-
structs the model about the expected output format.
Based on the following context that was derived from a street-level pho-
tograph showing the street, recommend the most suitable tagging for an
OpenStreetMap highway feature. Omit the ’oneway’ and ’lit’ tags if the
answer to the corresponding questions is no or N/A. Format your sug-
gested key-value pairs as a JSON. Your response should only contain this
JSON.
The remainder of individual prompts is organized into four scenarios con-
structed from the responses of analysts and additional context. Example re-
sponses from analysts and additional context are highlighted in bold.
The Baseline scenario uses only responses from analysts, and contains the
following text in addition to the common message above:
The content of the photograph was described as follows: A city road
in an urban area along an elevated railway. There is a wide
sidewalk on both sides and trees on the left. The road is mainly
used by: cars. The surface of the road is: asphalt.
When asked how many traffic lanes there are on the road, one would
answer: 3.
When asked if this street is a one-way road, one would answer: No.
When asked if there are any street lights in the photograph, one would
answer: Yes.
The Locational context (LC) enhanced scenario provides ChatGPT with
additional locational context that describes where the roads in questions are lo-
cated. In addition to the baseline message, it contains the following:
The photograph was taken near Downtown Miami, Florida.
SDSS 2023 7
The Object detection (OD) enhanced scenario uses a list of detected
objects in addition to the baseline:
When guessing the correct category, consider that the following list of
objects (separated by semicolon) are present in the photograph: Tempo-
rary barrier; Traffic light - horizontal; Traffic light - pedestrian;
Signage
Finally, Object detection and locational context (OD + LC) are com-
bined into a new scenario that supplies both additional contexts for the language
model.
Fig. 3. Workflow of using ChatGPT to suggest OSM "highway" tags
The last step in the process is to supply the prompts described above to
GPT-3.5-turbo (ChatGPT for simplicity). The model responds with a JSON
document containing the suggested OSM tagging for the roadway, e.g. ("highway"
="primary", "lanes"= 3), which can be compared to the original OSM tags
of the same roadway.
The final dataset contains 94 OSM highway features and their original tags.
For four scenarios and three analysts described above, ChatGPT recommen-
dations based on the corresponding prompts were also recorded for the same
roadway, resulting in a total of 12 tagging suggestions. These suggestions are
then compared to the original OSM tags to assess the accuracy of a particular
scenario and analyst.
8 L. Juh´asz et al.
3 Results
3.1 Accuracy of suggesting road categories
Table 3 lists the correctness of ChatGPT suggested road categories based on
two different methods. First, we consider historical "highway" values of an OSM
road. A suggestion was considered correct if the current or any previous versions
of the corresponding OSM highway value matched the suggested tag of Chat-
GPT. This step takes into account differences in how individual mappers may
perceive road features (e.g. primary vs. secondary). The second method is based
on semantic road categories listed in Table 1. Considering groups of roads as
opposed to individual "highway" values mitigates the fact that OSM tagging
often follows administrative roles that are difficult to infer from photographs.
Table 3 reports the accuracy of individual analysts across the four scenarios as
well as the average correctness for analysts (values on bottom) and scenarios
(values in different rows).
Table 3. Accuracy score of OSM tags suggested by ChatGPT. (LC = Locational
context, OD = Object detection, OD + LC = Object Detection + Locational context)
Based on historical "highway" values
Scenario BLIP-2 Analyst #1 Analyst #2 Avg. correct [%] % change
Baseline 23.4 37.2 31.9 30.8 -
LC 24.5 46.8 34.0 35.1 +4.3
OD 27.7 47.9 39.4 38.3 +7.5
OD + LC 30.9 47.9 50.0 42.9 +12.1
Avg. correct [%] 26.6 45.0 38.8
Based on semantic road categories
Scenario BLIP-2 Analyst #1 Analyst #2 Avg. correct [%] % change
Baseline 25.5 54.3 41.5 40.4 -
LC 35.1 64.9 45.7 48.6 +8.2
OD 29.8 63.8 60.6 51.4 +11.0
OD + LC 43.6 66.0 70.2 59.9 +19.5
Avg. correct [%] 33.5 62.3 54.5
BLIP-2 achieved the lowest accuracy among the three analysts, followed by
Analysts #2 and #1 respectively. This resembles the level of detail analysts
described photographs with, which suggests that in general, providing more de-
tailed image captions may lead to more accurate tag suggestions by ChatGPT.
On average, this method increased the accuracy by up to 28.8% between BLIP-2
(lowest detail) and Analyst #2 (highest detail).
This is further supported by the average accuracy achieved in different sce-
narios. The baseline scenario, which used prompts purely based on the visual
description of street-level photographs, achieved a suggestion accuracy of 30-40%
on average from the three analysts. Providing additional context in different sce-
narios increased this accuracy. Additional location context, i.e. specifying that
SDSS 2023 9
the roads are located near Downtown Miami (LC scenario) increased sugges-
tion accuracy by 4.3-8.2% on average, depending on the evaluation method. This
can potentially be explained by regional differences in OSM tagging practices
which are usually determined by local communities. In this scenario, it is pos-
sible that the AI model considered these regional differences when suggesting
"highway" tags. Providing a list of objects detected in the source photographs
(OD scenario) increased the average suggestion accuracy by 7.5 - 11.0% com-
pared to the baseline scenario. A potential explanation for this is that objects
found on and near roads provide important details that help refine the category
of roads. Finally, combining additional locational and object detection contexts
(OD + LC scenario) with the description of photographs by analysts increases
suggestion accuracy by 12.1 - 19.5% on average. It is important to mention that
these improvements are observed across all analysts.
3.2 Additional tag suggestions
In addition to the main "highway" category, information about additional char-
acteristics of roads can also be recorded in OSM. To assess such a scenario, we
analyze the "lit" tag, which indicates the presence of lighting on a particu-
lar road segment. The "lit" tag is set to "yes" if there are lights installed
along the roadway. One question explicitly asked analysts whether street lights
are visible on street-level photographs. In addition, street lights are a potential
object category in Mapillary detections. For the following analysis, we consider
the Object Detection enhanced scenario. The original dataset contains 24
roadways with "lit"="yes".
Table 4 shows that ChatGPT correctly suggested the presence of "lit" tag
between 63% (BLIP-2) and 92% (Analyst #2) of the existing cases. ChatGPT
suggested the use of the "lit" tag for an additional 44 - 61 features that are
potentially missing from OSM. Among these features, 59 have been suggested
based on prompts from at least two analysts, and 36 have been suggested based
on all three analysts.
Table 4. ChatGPT suggestions of the "lit" tag.
BLIP-2 Analyst #1 Analyst #2
Correctly tagged 15 (63%) 20 (83%) 22 (92%)
Additional 58 61 44
3.3 Limitations of the results
One limitation of this study is that we conducted the experiment on a small,
geographically limited sample size. The implications of these are well studied in
the GIScience literature, such as the uneven coverage of street-level photographs
10 L. Juh´asz et al.
[9] and the heterogeneity of OSM tagging [4], which could limit the adaptability
of our method to different areas. The other group of limitations is largely related
to open problems in Computer Science, such as the so-called “hallucinations” of
generative AI, which results in content that is false or factually incorrect [1], as
well as the non-deterministic nature of ChatGPT’s answers [19].
4 Summary and discussion
The study employs ChatGPT as a mapping assistant to propose OSM road
tags based on textual descriptions of street-level photos. It leverages freely avail-
able geospatial data (OSM, Mapillary) and off-the-shelf models (GPT-3.5-turbo,
BLIP-2) to enhance collaborative mapping. ChatGPT accurately suggests OSM
"highway" values from text in 39-45% cases, rising to 55-62% for semantic road
categories. Substituting human analysts with BLIP-2 yielded less accurate re-
sults (27-34%), potentially due to short captions. The study explores prompt
engineering and context addition for improved accuracy. Providing road loca-
tion raised accuracy by 4-8%, object lists boosted it by 8-11%, and combining
both enhanced it by 12-20%. The study also found that increasing the level of
detail with which a road scene is described in a prompt increases the accuracy of
the suggested highway tags. However, it is expected that there are limitations of
this method in terms of the accuracy that can be theoretically achieved. Future
research will expand OSM sample size and incorporate refined traffic-related
data (e.g. speed limits, turn restrictions).
Although experiments like this are useful initial steps, we urge the GIScience
community to go beyond simply applying AI in geographic contexts and focus
on synergistic research that advances both the spatial sciences and AI research
(see, e.g. [12,6]). There are multiple potential extension of this work along this
idea that go beyond the case study presented in this paper. For example, the
exploration of a multimodal conversational AI agent for spatial data science is
a promising research direction. In theory, incorporating a spatial understand-
ing component in a multimodal AI system allows it to comprehend and analyze
geospatial data. This could result in a method for the AI to interpret and in-
teract with geospatial data, similar to how BLIP-2 enables language models to
understand images. Future research should focus on exploring the potential of
this integration and on deepening our understanding of the theoretical and prac-
tical aspects of this fusion. This, in turn will advance the field of research and lay
the groundwork for future innovations in comprehensive multimodal AI systems
for geospatial science.
Data availability The dataset supporting findings presented in this paper is
freely available at: https://doi.org/10.17605/OSF.IO/M9RSG.
Acknowledgements Authors would like to thank Mei Hamaguchi and Flora
Beleznay for providing image captions and visual Q&A.
SDSS 2023 11
References
1. Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z.,
Yu, T., Chung, W., Do, Q.V., Xu, Y., Fung, P.: A Multitask, Multilingual, Mul-
timodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
(Feb 2023). https://doi.org/10.48550/arXiv.2302.04023
2. Dwivedi, Y.K., Kshetri, N., Hughes, L., Slade, E.L., Jeyaraj, A., Kar, A.K.,
Baabdullah, A.M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Al-
bashrawi, M.A., Al-Busaidi, A.S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I.,
Brooks, L., Buhalis, D., Carter, L., Chowdhury, S., Crick, T., Cunningham, S.W.,
Davies, G.H., Davison, R.M., D´e, R., Dennehy, D., Duan, Y., Dubey, R., Dwivedi,
R., Edwards, J.S., Flavi´an, C., Gauld, R., Grover, V., Hu, M.C., Janssen, M.,
Jones, P., Junglas, I., Khorana, S., Kraus, S., Larsen, K.R., Latreille, P., Laumer,
S., Malik, F.T., Mardani, A., Mariani, M., Mithas, S., Mogaji, E., Nord, J.H.,
O’Connor, S., Okumus, F., Pagani, M., Pandey, N., Papagiannidis, S., Pappas,
I.O., Pathak, N., Pries-Heje, J., Raman, R., Rana, N.P., Rehm, S.V., Ribeiro-
Navarrete, S., Richter, A., Rowe, F., Sarker, S., Stahl, B.C., Tiwari, M.K., van
der Aalst, W., Venkatesh, V., Viglia, G., Wade, M., Walton, P., Wirtz, J., Wright,
R.: “so what if chatgpt wrote it?” multidisciplinary perspectives on opportunities,
challenges and implications of generative conversational ai for research, practice
and policy. International Journal of Information Management 71, 102642 (2023).
https://doi.org/10.1016/j.ijinfomgt.2023.102642
3. Ertler, C., Mislej, J., Ollmann, T., Porzi, L., Neuhold, G., Kuang, Y.: The Map-
illary Traffic Sign Dataset for Detection and Classification on a Global Scale. In:
Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision ECCV
2020. pp. 68–84. Lecture Notes in Computer Science, Springer International Pub-
lishing, Cham, Switzerland (2020). https://doi.org/10.1007/978-3-030-58592-1 5
4. Girres, J.F., Touya, G.: Quality assessment of the French OpenStreetMap dataset.
Transactions in GIS 14(4), 435–459 (2010). https://doi.org/10.1111/j.1467-
9671.2010.01203.x
5. Goodchild, M.F.: Citizens as sensors: the world of volunteered geography. Geo-
Journal 69(4), 211–221 (2007). https://doi.org/10.1007/s10708-007-9111-y
6. Janowicz, K.: Philosophical foundations of geoai: Exploring sustain-
ability, diversity, and bias in geoai and spatial data science (2023).
https://doi.org/10.48550/arXiv.2304.06508
7. Janowicz, K., Gao, S., McKenzie, G., Hu, Y., Bhaduri, B.: Geoai: spatially explicit
artificial intelligence techniques for geographic knowledge discovery and beyond.
International Journal of Geographical Information Science 34(4), 625–636 (2020).
https://doi.org/10.1080/13658816.2019.1684500
8. Juh´asz, L., Hochmair, H.H.: Cross-linkage between Mapillary Street Level Photos
and OSM Edits. In: Sarjakoski, T., Santos, M.Y., Sarjakoski, T. (eds.) Geospa-
tial Data in a Changing World: Selected papers of the 19th AGILE Conference
on Geographic Information Science, vol. Lecture Notes in Geoinformation and
Cartography, pp. 141–156. Springer, Berlin (2016). https://doi.org/10.1007/978-3-
319-33783-8 9
9. Juh´asz, L., Hochmair, H.H.: User Contribution Patterns and Completeness Eval-
uation of Mapillary, a Crowdsourced Street Level Photo Service. Transactions in
GIS 20(6), 925–947 (2016). https://doi.org/10.1111/tgis.12190
10. Juh´asz, L., Hochmair, H.H.: How do volunteer mappers use crowdsourced Mapil-
lary street level images to enrich OpenStreetMap? In: Bregt, A., Sarjakoski, T.,
12 L. Juh´asz et al.
Lammeren, R.v., Rip, F. (eds.) Societal Geo-Innovation : short papers, posters
and poster abstracts of the 20th AGILE Conference on Geographic Information
Science. Wageningen, The Netherlands (2017)
11. Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: Bootstrapping Language-Image Pre-
training with Frozen Image Encoders and Large Language Models (May 2023).
https://doi.org/10.48550/arXiv.2301.12597
12. Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: Bootstrapping Language-Image Pre-
training for Unified Vision-Language Understanding and Generation (Feb 2022).
https://doi.org/10.48550/arXiv.2201.12086
13. Liu, L., Olteanu-Raimond, A.M., Jolivet, L., Bris, A.l., See, L.: A data
fusion-based framework to integrate multi-source vgi in an authoritative land
use database. International Journal of Digital Earth 14(4), 480–509 (2021).
https://doi.org/10.1080/17538947.2020.1842524
14. Mooney, P., Corcoran, P.: The Annotation Process in OpenStreetMap.
Transactions in GIS 16(4), 561–579 (2012). https://doi.org/10.1111/j.1467-
9671.2012.01306.x
15. Mooney, P., Cui, W., Guan, B., Juh´asz, L.: Towards Understanding the Spatial
Literacy of ChatGPT Taking a Geographic Information Systems (GIS) Exam
(2023). https://doi.org/10.31223/X5P38P, earthArXiv
16. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang,
C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller,
L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., Lowe, R.: Train-
ing language models to follow instructions with human feedback (Mar 2022).
https://doi.org/10.48550/arXiv.2203.02155
17. Reynolds, L., McDonell, K.: Prompt Programming for Large Language Mod-
els: Beyond the Few-Shot Paradigm. In: Extended Abstracts of the 2021
CHI Conference on Human Factors in Computing Systems. pp. 1–7. CHI EA
’21, Association for Computing Machinery, New York, NY, USA (May 2021).
https://doi.org/10.1145/3411763.3451760
18. Tao, R., Xu, J.: Mapping with chatgpt. ISPRS International Journal of Geo-
Information 12(7), 284 (2023). https://doi.org/10.3390/ijgi12070284
19. Wang, S., Scells, H., Koopman, B., Zuccon, G.: Can ChatGPT Write a
Good Boolean Query for Systematic Review Literature Search? (Feb 2023).
https://doi.org/10.48550/arXiv.2302.03495
... This study illustrated the potential added value of integrating generative AI with GIScience methods by showcasing its applications for disaster management contexts, where rapid, conversational access to accurate geospatial information would be critical . In another application on mapping, Juhász et al. (2023) explored generative AI's usefulness as a tool in creating and enriching map databases by evaluating GPT-3.5's capability to propose OpenStreetMap road tags based on human-provided textual descriptions of street-level photos. ...
... For each county, we asked GPT-3.5, GPT-4, and Gemini Pro a spatial query question in the form of a prompt and recorded its answer via application programming interface (API) to reduce uncertainties related to the black box nature of generative AI (Atkins et al. 2024;Kim et al. 2024). Two prompts were crafted to retrieve compatible and sound responses regarding our spatial queries to the LLMs Jiang and Yang 2023;Juhász et al. 2023;Kim et al. 2024;Xu 2023, Wu et al. 2023). Our prompts received a list of neighbor counties for a given focal county in terms of the following two distinctive spatial queries: (1) queen-type adjacent neighbors and (2) K-5 nearest neighbors. ...
Article
Full-text available
Since the launch of ChatGPT in 2022 and Gemini in 2023, there has been growing interest in the potential application of generative artificial intelligence (AI) in geography and GIScience. As the need for geospatially capable generative AI tools increases, an empirical investigation of generative AI tools’ performance in spatial querying is urgently needed. To fill this gap, we conducted experiments to assess ChatGPT and Gemini regarding their ability to generate accurate answers to spatial queries. The results reveal that ChatGPT and Gemini answered spatial queries to identify neighboring counties as defined by two methods for defining the neighboring relationship between geographical methods (queen contiguity and K-5 nearest neighbors) with accuracies ranging between 49 percent (K-5 with Gemini Pro) and 79 percent (queen with GPT-4). Specifically, GPT-4 outperforms GPT-3.5 and Gemini Pro, and queen contiguity queries yield more accurate answers than K-5 queries. Furthermore, our results show the potential sociodemographic and geographic biases in responses from both ChatGPT and Gemini. In general, the AI models retrieved more accurate answers for counties with larger proportions of urbanized areas and inland counties than their counterparts. Based on these findings, we discuss potential implications for geographers, GIScience researchers, and AI developers.
... An example is the work where zero-shot prompting was used to automatically suggest OpenStreetMaps tags (https://www.openstreetmap.org/, accessed on 10 July 2024) based on data automatically extracted from images [17]. Another example of the application of this technique is robot navigation. ...
... An example is the work where zero-shot prompting was used to automatically suggest OpenStreetMaps tags (https://www.openstreetmap. org/, accessed on 10 July 2024) based on data automatically extracted from images [17]. Another example of the application of this technique is robot navigation. ...
Article
Full-text available
The development and popularization of navigation applications are increasing expectations for their quality and functionality. Users need continuous navigation not only outdoors, but also indoors. In this case, however, the perception of space and movement is somewhat different than it is outside. One potential method of meeting this need may be the use of so-called geodescriptions— multi-level textual descriptions relating to a point, line or area in a building. Currently, geo-descriptions are created manually. However, this is a rather time-consuming and complex process. Therefore, this study undertook to automate this process as much as possible. The study uses classical methods of spatial analysis from GIS systems and text generation methods based on artificial intelligence (AI) techniques, i.e., large language models (LLM). In this article, special attention will be paid to the second group of methods. As part of the first stage of the research, which was aimed at testing the proposed concept, the possibility of LLMs creating a natural description of space based on a list of features of a given place obtained by other methods (input parameters for AI), such as coordinates and categories of rooms around a given point, etc., was tested. The focus is on interior spaces and a few selected features of a particular place. In the next stages, it is planned to extend the research to spaces outside buildings. In addition, artificial intelligence can be used to provide the input parameters mentioned above.
... LLMs have demonstrated a wide range of capabilities in addressing spatial tasks, including code generation and interpretation, GIS concepts and spatial literacy [30,31], toponym recognition [32], spatial reasoning [33], enriching maps with scene content from street-level images [34] and breaking down spatial analysis questions into spatial operations to synthesize geoprocessing workflows [35]. While these findings provide useful examples, there is a need to develop an LLM agent capable of answering arbitrary spatial questions. ...
Article
Full-text available
Recent years have witnessed a revolution of artificial intelligence (AI) technologies, highlighted by the rise of generative AI and geospatial artificial intelligence (GeoAI) [...]
... In other areas such as geography, ChatGPT is also used for various studies. For example, it provides invaluable assistance in creating large-scale maps and understanding geographic space (Tao and Xu 2023), serves as a cartographic assistant, considerably enriching maps (Juhász et al. 2023) or to study the dynamics of coastal water quality (Aziz et al. 2024), though overall, research in this area remains limited. ...
Article
Full-text available
The scientific literature on residential segregation in large metropolitan areas highlights various explanatory factors, including economic, social, political, landscape, and cultural elements related to both migrant and local populations. This paper contrasts the impact of these factors individually, such as the immigrant rate and neighborhood segregation. To achieve this, a machine learning analysis was conducted on a sample of neighborhoods in the main Spanish metropolitan areas (Madrid and Barcelona), using a database created from a combination of official statistical sources and textual sources, such as Wikipedia. These texts were transformed into indexes using Natural Language Processing (NLP) and other artificial intelligence algorithms capable of interpreting images and converting them into indexes. The results indicate that the factors influencing immigrant concentration and segregation differ significantly, with crucial roles played by the urban landscape, population size, and geographic origin. While land prices showed a relationship with immigrant concentration, their effect on segregation was mediated by factors such as overcrowding, social support networks, and landscape degradation. The novel application of AI and big data, particularly through ChatGPT and Google Street View, has enhanced model predictability, contributing to the scientific literature on segregated spaces.
Conference Paper
Full-text available
Работа представляет опыт подготовки и проведения курса по интеллектуальным геоинформационным системам для студентов вузов. Показано содержание курса учебного года для осеннего и весеннего семестров, представлены темы занятий, программное обеспечение для выполнения практических работ. Обновление курса непрерывно, выполняется на основе российского и международного опыта: научных статей, докладов отрасли искусственного интеллекта и профильных геоинформационных конференций. С 2020 года курс читается в бакалавриате Института радиоэлектроники и информатики Российского технологического университета (РТУ МИРЭА), его материалы будут использованы для курса по машинному обучению для решения пространственных задач в магистратуре Факультета географии и геоинформационных технологий Высшей школы экономики (НИУ ВШЭ) в 2025 году.
Article
Full-text available
Large Language Models (LLMs) excel in natural language-relevant tasks like text generation and question answering Q&A. To further expand their application, efforts focus on enabling LLMs to utilize real-world tools. However, their tool-use ability in professional GIS remains under explored due to two main challenges. Firstly, LLMs are usually trained on general-domain corpora, lacking sufficient and comprehensive GIS-specific data to align with professional knowledge, including understanding the functions of GIS tools. Secondly, researchers often need to combine multiple GIS tools to solve geospatial tasks. To address these challenges, we propose a trainable method to enable LLMs to master GIS tools. We curated a comprehensive set of resources: instruction-response data (GeoTool, 1950 instructions) to enhance the understanding of LLMs for GIS tools, instruction-solution data (GeoSolution, 3645 instructions) to improve their ability to generate tool-use solutions for geospatial tasks, and annotated instruction-solution evaluation data (GeoTask, 300 instructions) for evaluating LLMs' GIS tool-use proficiency. Using the collected training data (GeoTool and GeoSolution), we fine-tuned a professional-domain LLM called GeoTool-GPT based on an open-source general-domain LLM, the LLaMA-2-7b model. The experiment based on evaluation data validates our method's effectiveness in enhancing the tool-use ability of general-domain LLMs in the professional GIS domain, with the performance of our model closely approaching that of GPT-4.
Preprint
Full-text available
The adoption of Generative Artificial Intelligence (GenAI) tools is drastically changing the way that researchers work. While debate on the quality of GenAI outputs continues, there is optimism that GenAI may help human experts to address the most significant environmental challenges facing society. No previous research has quantitatively assessed the quality of GenAI outputs intended to inform environmental management decisions. Here we surveyed 98 environmental scientists and used their expertise to assess the quality of human and GenAI content relevant to their discipline. We analysed the quality and relative preference between human and GenAI content across three use cases in environmental science outreach and communication. Our results indicate that the GenAI content was generally deemed adequate in quality by human experts, with an average of 82% of respondents indicating a quality of adequate or better across the three use cases. Respondents exhibited strong preferences for GenAI over human-only content when using GenAI imageery of future park management scenarios. For the use cases of generating a wetland planting guide and answering a question about invasive species management, preferences were heterogeneous amongst respondents. Our findings raise substantive questions about GenAI content as a complement to human expertise when research is transferred to public audiences.
Article
Intelligent information systems in geoinformatics are a promising area of science, production and education. An analysis of publications on intelligent GIS is presented following a brief overview of Russian courses and tutorials on the topics of intelligent geographic information systems. The structure and content of lectures and trainings of the course on geoinformation technologies in intellectual sciences, which is being taught to the undergraduate students of the 4th year on the specialty “Cartography and Geoinformatics” at the Department of Geographic Information Systems, Institute of Radio Electronics and Informatics of MIREA – Russian Technological University since 2021, have been considered. A description of software for the course training and application of AI-tool ChatGPT (Chat Generative PreTrained Transformer) were given and discussed. Noted, that the course content should be updated annually based on the materials of GIS- and IT-conferences and scientific papers. Course could be taught to the baccalaureate and masters students, as well as to the students of 5-years education (specialists)
Article
Full-text available
The emergence and rapid advancement of large language models (LLMs), represented by OpenAI’s Generative Pre-trained Transformer (GPT), has brought up new opportunities across various industries and disciplines. These cutting-edge technologies are transforming the way we interact with information, communicate, and solve complex problems. We conducted a pilot study exploring making maps with ChatGPT, a popular artificial intelligence (AI) chatbot. Specifically, we tested designing thematic maps using given or public geospatial data, as well as creating mental maps purely using textual descriptions of geographic space. We conclude that ChatGPT provides a useful alternative solution for mapping given its unique advantages, such as lowering the barrier to producing maps, boosting the efficiency of massive map production, and understanding geographical space with its spatial thinking capability. However, mapping with ChatGPT still has limitations at the current stage, such as its unequal benefits for different users and dependence on user intervention for quality control.
Preprint
Full-text available
This paper examines the performance of ChatGPT, a large language model (LLM), in a geographic information systems (GIS) exam. As LLMs like ChatGPT become increasingly prevalent in various domains, including education, it is important to understand their capabilities and limitations in specialized subject areas such as GIS. Human learning of spatial concepts significantly differs from LLM training methodologies. Therefore, this study aims to assess ChatGPT's performance and spatial literacy by challenging it with a real GIS exam. By analyzing ChatGPT's responses and evaluating its understanding of GIS principles, we gain insights into the potential applications and challenges of LLMs in spatially-oriented fields. We conduct our evaluation with two models, GPT-3.5 and GPT-4, to understand whether general improvements of an LLM translate to improvements in answering questions related to the spatial domain. We find that both GPT variants can pass a balanced, introductory GIS exam, scoring 63.3% (GPT-3.5) and 88.3% (GPT-4), which correspond to grades D and B+ respectively in standard US letter grading scale. In addition, we also identify specific questions and topics where the LLMs struggle to grasp spatial concepts, highlighting the challenges in teaching such topics to these models. Finally, we assess ChatGPT's performance in specific aspects of GIS, including spatial analysis, basic concepts of mapping, and data management. This granular analysis provides further insights into the strengths and weaknesses of ChatGPT's GIS literacy. This research contributes to the ongoing dialogue on the integration of AI models in education and can provide guidance for educators, researchers, and practitioners seeking to leverage LLMs in GIS. By focusing on specific questions or concepts that pose difficulties for the LLM, this study addresses the nuances of teaching spatial concepts to AI models and offers potential avenues for improvement in spatial literacy within future iterations of LLMs.
Preprint
Full-text available
This chapter presents some of the fundamental assumptions and principles that could form the philosophical foundation of GeoAI and spatial data science. Instead of reviewing the well-established characteristics of spatial data (analysis), including interaction, neighborhoods, and autocorrelation, the chapter highlights themes such as sustainability, bias in training data, diversity in schema knowledge, and the (potential lack of) neutrality of GeoAI systems from a unifying ethical perspective. Reflecting on our profession's ethical implications will assist us in conducting potentially disruptive research more responsibly, identifying pitfalls in designing, training, and deploying GeoAI-based systems, and developing a shared understanding of the benefits but also potential dangers of artificial intelligence and machine learning research across academic fields, all while sharing our unique (geo)spatial perspective with others.
Article
Full-text available
Transformative artificially intelligent tools, such as ChatGPT, designed to generate sophisticated text indistinguishable from that produced by a human, are applicable across a wide range of contexts. The technology presents opportunities as well as, often ethical and legal, challenges, and has the potential for both positive and negative impacts for organisations, society, and individuals. Offering multi-disciplinary insight into some of these, this article brings together 43 contributions from experts in fields such as computer science, marketing, information systems, education, policy, hospitality and tourism, management, publishing, and nursing. The contributors acknowledge ChatGPT’s capabilities to enhance productivity and suggest that it is likely to offer significant gains in the banking, hospitality and tourism, and information technology industries, and enhance business activities, such as management and marketing. Nevertheless, they also consider its limitations, disruptions to practices, threats to privacy and security, and consequences of biases, misuse, and misinformation. However, opinion is split on whether ChatGPT’s use should be restricted or legislated. Drawing on these contributions, the article identifies questions requiring further research across three thematic areas: knowledge, transparency, and ethics; digital transformation of organisations and societies; and teaching, learning, and scholarly research. The avenues for further research include: identifying skills, resources, and capabilities needed to handle generative AI; examining biases of generative AI attributable to training datasets and processes; exploring business and societal contexts best suited for generative AI implementation; determining optimal combinations of human and generative AI for various tasks; identifying ways to assess accuracy of text produced by generative AI; and uncovering the ethical and legal issues in using generative AI across different contexts.
Preprint
Full-text available
This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 21 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code generation step. Moreover, we find that ChatGPT is 64.33% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning, hence making it an unreliable reasoner. It is, for example, better at deductive than inductive reasoning. ChatGPT suffers from hallucination problems like other LLMs and it generates more extrinsic hallucinations from its parametric memory as it does not have access to an external knowledge base. Finally, the interactive feature of ChatGPT enables human collaboration with the underlying LLM to improve its performance, i.e, 8% ROUGE-1 on summarization and 2% ChrF++ on machine translation, in a multi-turn "prompt engineering" fashion.
Preprint
Full-text available
Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topic. However, it often takes a long time for systematic review researchers to construct a high quality systematic review Boolean query, and often the resulting queries are far from effective. Poor queries may lead to biased or invalid reviews, because they missed to retrieve key evidence, or to extensive increase in review costs, because they retrieved too many irrelevant studies. Recent advances in Transformer-based generative models have shown great potential to effectively follow instructions from users and generate answers based on the instructions being made. In this paper, we investigate the effectiveness of the latest of such models, ChatGPT, in generating effective Boolean queries for systematic review literature search. Through a number of extensive experiments on standard test collections for the task, we find that ChatGPT is capable of generating queries that lead to high search precision, although trading-off this for recall. Overall, our study demonstrates the potential of ChatGPT in generating effective Boolean queries for systematic review literature search. The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.
Preprint
Full-text available
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. BLIP-2 bridges the modality gap with a lightweight Querying Transformer, which is pre-trained in two stages. The first stage bootstraps vision-language representation learning from a frozen image encoder. The second stage bootstraps vision-to-language generative learning from a frozen language model. BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods. For example, our model outperforms Flamingo80B by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters. We also demonstrate the model's emerging capabilities of zero-shot image-to-text generation that can follow natural language instructions.
Article
Updating an authoritative Land Use and Land Cover (LULC) database requires many resources. Volunteered geographic information (VGI) involves citizens in the collection of data about their spatial environment. There is a growing interest in using existing VGI to update authoritative databases. This paper presents a framework aimed at integrating multi-source VGI based on a data fusion technique, in order to update an authoritative land use database. Each VGI data source is considered to be an independent source of information, which is fused together using Dempster-Shafer Theory (DST). The framework is tested in the updating of the authoritative land use data produced by the French National Mapping Agency. Four data sets were collected from several in-situ and remote campaigns run between 2018 and 2020 by contributors with varying profiles. The data fusion approach achieved an overall accuracy of 85.6% for the 144 features having at least two contributions when the confidence threshold was set to 0.05. Despite the heterogeneity and limited amount of VGI used, the results are promising, with 99% of the LU polygons updated or enriched. These results show the potential of using multi-source VGI to update or enrich authoritative LU data and potentially LULC data more generally.
Chapter
Traffic signs are essential map features for smart cities and navigation. To develop accurate and robust algorithms for traffic sign detection and classification, a large-scale and diverse benchmark dataset is required. In this paper, we introduce a new traffic sign dataset of 105K street-level images around the world covering 400 manually annotated traffic sign classes in diverse scenes, wide range of geographical locations, and varying weather and lighting conditions. The dataset includes 52K fully annotated images. Additionally, we show how to augment the dataset with 53K semi-supervised, partially annotated images. This is the largest and the most diverse traffic sign dataset consisting of images from all over the world with fine-grained annotations of traffic sign classes. We run extensive experiments to establish strong baselines for both detection and classification tasks. In addition, we verify that the diversity of this dataset enables effective transfer learning for existing large-scale benchmark datasets on traffic sign detection and classification. The dataset is freely available for academic research (www.mapillary.com/dataset/trafficsign) .