Figure 3 - uploaded by Gengchen Mai
Content may be subject to copyright.
An illustration for map projection distortion: (a)-(d): Tissot indicatrices for four projections. The equal area circles are putted in different locations to show how the map distortion affect its shape.
Source publication
We propose a general-purpose spherical location encoder, Sphere2Vec, which, as far as we know, is the first location encoder which aims at preserving spherical distance. • We provide a theoretical proof about the spherical-distance-kept nature of Sphere2Vec. • We provide theoretical proof to show why the previous 2D location encoders and NeRF-style...
Contexts in source publication
Context 1
... Mai et al., 2020b) is between the two. For more examples, please see Figure 13 and 14. ...
Context 2
... are no map projection can preserve distances at all direction. The so-called equidistant projection can only preserve distance on one direction, e.g., the longitude direction for the equirectangular projection (See Figure 3d), while the conformal map projections (See Figure 3a) can preserve directions while resulting in a large distance distortion. For a comprehensive overview of map projections and their distortions, see Mulcahy and Clarke (2001). ...
Context 3
... are no map projection can preserve distances at all direction. The so-called equidistant projection can only preserve distance on one direction, e.g., the longitude direction for the equirectangular projection (See Figure 3d), while the conformal map projections (See Figure 3a) can preserve directions while resulting in a large distance distortion. For a comprehensive overview of map projections and their distortions, see Mulcahy and Clarke (2001). ...
Context 4
... have a better understanding of how well different location encoders model the geographic prior distributions of different image labels, we use iNat2018 and fMoW data as examples and plot the predicted spatial distributions of different example species/land use types from different location encoders, and compare them with the training sample locations of the corresponding species or land use types (see Figure 13 and 14). ...
Context 5
... Figure 13, we can see that (Mac Aodha et al., 2019) produces rather over-generalized species distributions due to the fact that it is a single-scale location encoder. + (our model) produces a more compact and fine-grained distribution in each geographic region, especially in the polar region and in data-sparse areas such as Africa and Asia. ...
Context 6
... distributions produced by (Mai et al., 2020b) are between these two. However, has limited spatial distribution modeling ability in the polar area (e.g., Figure 13d and 13s) as well as data-sparse regions. ...
Context 7
... example, in the white-browed wagtail example, produces an over-generalized spatial distribution which covers India, East Saudi Arabia, and the Southwest of China (See Figure 13m). However, according to the training sample locations (Figure 13l), white-browed wagtails only occur in India. is better than but still produces a distribution covering the Southwest of China. ...
Context 8
... example, in the white-browed wagtail example, produces an over-generalized spatial distribution which covers India, East Saudi Arabia, and the Southwest of China (See Figure 13m). However, according to the training sample locations (Figure 13l), white-browed wagtails only occur in India. is better than but still produces a distribution covering the Southwest of China. + produces the best compact distribution estimation. ...
Context 9
... produces the best compact distribution estimation. Similarly, for the red-striped leafwing, the sample locations are clustered in a small region in West Africa while produces an over-generalized distribution (see Figure 13ab). produces a better distribution estimation (see Figure 13ac) but it still has a over-generalized issue. ...
Context 10
... for the red-striped leafwing, the sample locations are clustered in a small region in West Africa while produces an over-generalized distribution (see Figure 13ab). produces a better distribution estimation (see Figure 13ac) but it still has a over-generalized issue. Our + produces the best estimation among these three models -a compact distribution estimation covering the exact West Africa region (See Figure 13ad). ...
Context 11
... a better distribution estimation (see Figure 13ac) but it still has a over-generalized issue. Our + produces the best estimation among these three models -a compact distribution estimation covering the exact West Africa region (See Figure 13ad). ...
Similar publications
It happens in the examination paper that text lines include inconsistent nonuniform word size, character erasure, diverse text length and dense long texts. This paper proposes an improved method for ViT to enhance its capability in recognizing text lines in handwritten Chinese examination papers. First, this method employs a segmentation method sui...
Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction an...
Adapting pre-trained models to open classes is a challenging problem in machine learning. Vision-language models fully explore the knowledge of text modality, demonstrating strong zero-shot recognition performance, which is naturally suited for various open-set problems. More recently, some research focuses on fine-tuning such models to downstream...
Citations
... Notable examples include the works of [4] and [5], which utilize the spatial context as geographic priors and loss function components respectively to refine the learning tasks. On the other hand, the works such as [6] and [7], encode the spatial information into semantic embeddings, providing more flexibility of incorporating such information into the deep learning models. Despite their effectiveness, these methods often fall short in addressing the complex interactions between locations and their intrinsic spatial properties. ...
This study introduces a novel approach to terrain feature classification by incorporating spatial point pattern statistics into deep learning models. Inspired by the concept of location encoding, which aims to capture location characteristics to enhance GeoAI decision-making capabilities, we improve the GeoAI model by a knowledge driven approach to integrate both first-order and second-order effects of point patterns. This paper investigates how these spatial contexts impact the accuracy of terrain feature predictions. The results show that incorporating spatial point pattern statistics notably enhances model performance by leveraging different representations of spatial relationships.
... In geospatial science, spatial representation learning (SRL) focuses on deriving neural representations directly from various spatial data types, such as points, polylines, polygons, and graphs [19,21]. Accordingly, SRL models can be categorized based on the type of spatial data they process, including location encoders [7,11,13,14,20], polyline encoders [2,25,26], polygon encoders [17,32,38], and graph encoders [12,34]. Each of these categories enables specialized neural models to learn from different spatial structures. ...
... SRL facilitates end-to-end model training directly on spatial data by automatically extracting structured, learning-friendly representations from various types of spatial data. A crucial component of SRL, location encoders, are designed to transform raw spatial coordinates into representations that can be effectively utilized in a range of downstream tasks, such as finegrained species recognition [13,14,20] and geospatial distribution estimation [9], climate variables interpolation [28], remote sensing image classification [18], geographic question answering [15], etc. ...
... The TorchSpatial package incorporates fifteen widely recognized location encoders. These encoders are categorized into two groups: 1) 2D location encoders designed for a planar space [1,6,13,14,20,23,24] and 2) 3D location encoders which present geographic locations using 3D geographic coordinates [20,28]. ...
... This geo-encoded CNN model was later applied to global-scale vegetation canopy height mapping with satellite imagery [38], where the geolocations served as a prior. Other state-of-theart geolocation encoders include Space2Vec [39], Sphere2Vec [40], PE-GNN [41], and a more recent algorithm that is based on the spherical harmonic basis functions [42]. Multi-scale sinusoidal functions are favoured in building these encoders (e.g., [39], [42]), thanks to their merits of being bounded in value, infinitely extended in space, and possessing a multiresolution scalability. ...
... Multi-scale sinusoidal functions are favoured in building these encoders (e.g., [39], [42]), thanks to their merits of being bounded in value, infinitely extended in space, and possessing a multiresolution scalability. Geolocation encoding is demonstrated to be effective in many large-scale geospatial problems, such as animal species categorisation [38], [40], water quality prediction [43], event/activity recognition [44], and remote sensing scene classification [31], [40]. ...
... Multi-scale sinusoidal functions are favoured in building these encoders (e.g., [39], [42]), thanks to their merits of being bounded in value, infinitely extended in space, and possessing a multiresolution scalability. Geolocation encoding is demonstrated to be effective in many large-scale geospatial problems, such as animal species categorisation [38], [40], water quality prediction [43], event/activity recognition [44], and remote sensing scene classification [31], [40]. ...
Earth observation data have shown promise in predicting species richness of vascular plants (-diversity), but extending this approach to large spatial scales is challenging because geographically distant regions may exhibit different compositions of plant species (-diversity), resulting in a location-dependent relationship between richness and spectral measurements. In order to handle such geolocation dependency, we propose Spatioformer, where a novel geolocation encoder is coupled with the transformer model to encode geolocation context into remote sensing imagery. The Spatioformer model compares favourably to state-of-the-art models in richness predictions on a large-scale ground-truth richness dataset (HAVPlot) that consists of 68,170 in-situ richness samples covering diverse landscapes across Australia. The results demonstrate that geolocational information is advantageous in predicting species richness from satellite observations over large spatial scales. With Spatioformer, plant species richness maps over Australia are compiled from Landsat archive for the years from 2015 to 2023. The richness maps produced in this study reveal the spatiotemporal dynamics of plant species richness in Australia, providing supporting evidence to inform effective planning and policy development for plant diversity conservation. Regions of high richness prediction uncertainties are identified, highlighting the need for future in-situ surveys to be conducted in these areas to enhance the prediction accuracy.
... The EQ and SEQ grids are used in numerical weather prediction [15,37,55,64], astrophysics [5,14,57], and even machine learning [40]. GL grids are most commonly used for problems that employ spherical harmonics since Gaussian quadrature rules can be used to exactly integrate these basis functions [1,26,50]. ...
Spherical and polar geometries arise in many important areas of computational science, including weather and climate forecasting, optics, and astrophysics. In these applications, tensor-product grids are often used to represent unknowns. However, interpolation schemes that exploit the tensor-product structure can introduce artificial boundaries at the poles in spherical coordinates and at the origin in polar coordinates, leading to numerical challenges, especially for high-order methods. In this paper, we present new bivariate trigonometric barycentric interpolation formulas for spheres and bivariate trigonometric/polynomial barycentric formulas for disks, designed to overcome these issues. These formulas are also efficient, as they only rely on a set of (precomputed) weights that depend on the grid structure and not the data itself. The formulas are based on the Double Fourier Sphere (DFS) method, which transforms the sphere into a doubly periodic domain and the disk into a domain without an artificial boundary at the origin. For standard tensor-product grids, the proposed formulas exhibit exponential convergence when approximating smooth functions. We provide numerical results to demonstrate these convergence rates and showcase an application of the spherical barycentric formulas in a semi-Lagrangian advection scheme for solving the tracer transport equation on the sphere.
... Prior works in literature have focused on developing an LLM based interface to perform Natural Language Processing (NLP) tasks on Geospatial data. Mai et al. [15] showcased the practical applications of large language models (LLMs) in the geospatial domain which include tasks like recognizing fine-grained addresses, forecasting time-series data related to dementia records, and predicting urban functions. Zhang et al. [28] uses GeoGPT, an autonomous AI tool built upon GPT-3.5, designed to autonomously collect, process, and analyze geospatial data using only natural language instructions. ...
Large language models (LLMs) have shown promising results in learning and contextualizing information from different forms of data. Recent advancements in foundational models, particularly those employing self-attention mechanisms, have significantly enhanced our ability to comprehend the semantics of diverse data types. One such area that could highly benefit from multi-modality is in understanding geospatial data, which inherently has multiple modalities. However, current Natural Language Processing (NLP) mechanisms struggle to effectively address geospatial queries. Existing pre-trained LLMs are inadequately equipped to meet the unique demands of geospatial data, lacking the ability to retrieve precise spatio-temporal data in real-time, thus leading to significantly reduced accuracy in answering complex geospatial queries. To address these limitations, we introduce Geode--a pioneering system designed to tackle zero-shot geospatial question-answering tasks with high precision using spatio-temporal data retrieval. Our approach represents a significant improvement in addressing the limitations of current LLM models, demonstrating remarkable improvement in geospatial question-answering abilities compared to existing state-of-the-art pre-trained models.
... Secondly, location encoding of geolocation data [45,18], one of the key components of SRL, has been proved useful for various geospatial tasks such as fine-grained species recognition [42,51], satellite image classification [5,53,19,35], weather forecasting [8,37], and so on. However, no benchmark has been developed to systematically evaluate the location encoders' impact on model performance in tasks with diverse task setups, dataset sizes, and geographic coverage. ...
... Spatial representation learning [52] aims at learning neural spatial representation of spatial data in their native format. According to the targeted spatial data types, SRL can be classified into location encoders [42,46,53,51,35,18,67,13], polyline encoders [4,29,64,86,68,65], polygon encoders [75,33,50,80], polygon decoders [12,1,38,87], etc. By automatically extracting a learning-friendly representation from different types of spatial data, SRL enables end-to-end training on top of spatial data. ...
... By automatically extracting a learning-friendly representation from different types of spatial data, SRL enables end-to-end training on top of spatial data. As one of the key components of SRL, location encoders aim at encoding a location into a learning-friendly representation that can be used in many downstream tasks such as fine-grained species recognition [42,46,53] and distribution modeling [18], population mapping [67], satellite image classification [51,35], geographic question answering [44], etc. In this work, our TorchSpatial focuses on location encoder development and evaluation. ...
Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generation, geographic question answering, etc. Even though SRL has become the foundation of almost all geospatial artificial intelligence (GeoAI) research, we have not yet seen significant efforts to develop an extensive deep learning framework and benchmark to support SRL model development and evaluation. To fill this gap, we propose TorchSpatial, a learning framework and benchmark for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. TorchSpatial contains three key components: 1) a unified location encoding framework that consolidates 15 commonly recognized location encoders, ensuring scalability and reproducibility of the implementations; 2) the LocBench benchmark tasks encompassing 7 geo-aware image classification and 4 geo-aware image regression datasets; 3) a comprehensive suite of evaluation metrics to quantify geo-aware models' overall performance as well as their geographic bias, with a novel Geo-Bias Score metric. Finally, we provide a detailed analysis and insights into the model performance and geographic bias of different location encoders. We believe TorchSpatial will foster future advancement of spatial representation learning and spatial fairness in GeoAI research. The TorchSpatial model framework, LocBench, and Geo-Bias Score evaluation framework are available at https://github.com/seai-lab/TorchSpatial.
... In other words, grid cells represent spatial relationships independently of the agent's current visual sensory input and head direction (referred to as "allocentric" representation). Due to their close association with spatial representation, the principles of grid cell firing have been widely adopted by artificial intelligence researchers beyond neuroscience, including in robotics navigation system design [8], reinforcement learning agent trajectory planning [9], and positional encoding for geospatial data [10,11]. ...
Understanding spatial location and relationships is a fundamental capability for modern artificial intelligence systems. Insights from human spatial cognition provide valuable guidance in this domain. Recent neuroscientific discoveries have highlighted the role of grid cells as a fundamental neural component for spatial representation, including distance computation, path integration, and scale discernment. In this paper, we introduce a novel positional encoding scheme inspired by Fourier analysis and the latest findings in computational neuroscience regarding grid cells. Assuming that grid cells encode spatial position through a summation of Fourier basis functions, we demonstrate the translational invariance of the grid representation during inner product calculations. Additionally, we derive an optimal grid scale ratio for multi-dimensional Euclidean spaces based on principles of biological efficiency. Utilizing these computational principles, we have developed a **Grid**-cell inspired **Positional Encoding** technique, termed **GridPE**, for encoding locations within high-dimensional spaces. We integrated GridPE into the Pyramid Vision Transformer architecture. Our theoretical analysis shows that GridPE provides a unifying framework for positional encoding in arbitrary high-dimensional spaces. Experimental results demonstrate that GridPE significantly enhances the performance of transformers, underscoring the importance of incorporating neuroscientific insights into the design of artificial intelligence systems.
... Fostering cross-disciplinarity: Our community has shown that spatially explicit machine learning models do not only increase the accuracy of (Geo)AI models when applied to geographic data but also inform and improve more general models in various domains [12,14,13,15,5,23,20], e.g., leading to a broad interest in location encoding methods outside of GeoAI. Conversely, researchers from the broader AI community [16] try to utilize notions such as the MAUP to study problematic coverage and representation biases in training data for image-based foundation models. ...
Intuitively, there is a relation between measures of spatial dependence and information theoretical measures of entropy. For instance, we can provide an intuition of why spatial data is special by stating that, on average, spatial data samples contain less than expected information. Similarly, spatial data, e.g., remotely sensed imagery, that is easy to compress is also likely to show significant spatial autocorrelation. Formulating our (highly specific) core concepts of spatial information theory in the widely used language of information theory opens new perspectives on their differences and similarities and also fosters cross-disciplinary collaboration, e.g., with the broader AI/ML communities. Interestingly, however, this intuitive relation is challenging to formalize and generalize, leading prior work to rely mostly on experimental results, e.g., for describing landscape patterns. In this work, we will explore the information theoretical roots of spatial autocorrelation, more specifically Moran's I, through the lens of self-information (also known as surprisal) and provide both formal proofs and experiments.
... Representing a location using its neighbor fuzzy locations (Li et al. 2020) or a Markov transition matrix (Wang et al. 2019) can be regarded as a type of fixed-location encoding method. In contrast, methods such as Space2Vec (Mai et al. 2020) and Sphere2Vec (Mai et al. 2023) focus on preserving the distance relationship between locations while making the vectors trainable based on different downstream tasks. In this study, we introduce a location encoding method that suits discrete location representation methods. ...