Question-based analysis of Geographic Information with Semantic Queries (QuAnGIS)
Slides of the keynote talk given by Simon at the 11th International Conference on Geographical Information Science. Poznań, 27-39 September, 2021. The talk was recorded and is available on youtube under: https://www.youtube.com/watch?v=jA3lBeWAWEQ
Spatial network analysis is a collection of methods for measuring accessibility potentials as well as for analyzing flows over transport networks. Though it has been part of the practice of Geographic Information Systems (GIS) for a long time, designing network analytical workflows still requires a considerable amount of expertise. In principle, Artificial Intelligence (AI) methods for workflow synthesis could be used to automate this task. This would improve the (re)usability of analytic resources. However, though underlying graph algorithms are well understood, we still lack a conceptual model that captures the required methodological know-how. The reason is that in practice, this know-how goes beyond graph theory to a significant extent. In this article, we suggest to interpret spatial networks in terms of quantified relations between spatial objects, where both the objects themselves as well as their relations can be quantified in an extensive or an intensive manner. Using this model, it becomes possible to effectively organize data sources and network functions towards common analytical goals for answering questions. We tested our model based on 12 analytical tasks, and evaluated automatically synthesized work-flows with network experts. Results show that standard data models are insufficient for answering questions, and that our model adds information crucial for understanding spatial network functionality.
Loose programming enables analysts to program with concepts instead of procedural code. Data transformations are left underspecified, leaving away procedural details and exploiting knowledge about the applicability of functions to data types. To synthesize workflows of high quality for a geo-analytical task, the semantic type system needs to reflect knowledge of Geographic Information Systems (GIS) on a level that is deep enough to capture geo-analytical concepts and intentions, yet shallow enough to generalize over GIS implementations. Recently, core concepts of spatial information and related geo-analytical concepts were proposed as a way to add the required abstraction level to current geodata models. The core concept data types (CCD) ontology is a semantic type system that can be used to constrain GIS functions for workflow synthesis. However, to date, it is unknown what gain in precision and workflow quality can be expected. In this article, we synthesize workflows by annotating GIS tools with these types, specifying a range of common analytical tasks taken from an urban livability scenario. We measure the quality of automatically synthesized workflows against a benchmark generated from common data types. Results show that CCD concepts significantly improve the precision of workflow synthesis.
In this article, we critically examine the role of semantic technology in data driven analysis. We explain why learning from data is more than just analyzing data, including also a number of essential synthetic parts that suggest a revision of George Box’s model of data analysis in statistics. We review arguments from statistical learning under uncertainty, workflow reproducibility, as well as from philosophy of science, and propose an alternative, synthetic learning model that takes into account semantic conflicts, observation, biased model and data selection, as well as interpretation into background knowledge. The model highlights and clarifies the different roles that semantic technology may have in fostering reproduction and reuse of data analysis across communities of practice under the conditions of informational uncertainty. We also investigate the role of semantic technology in current analysis and workflow tools, compare it with the requirements of our model, and conclude with a roadmap of 8 challenging research problems which currently seem largely unaddressed.
Maintaining knowledge about the provenance of datasets, that is, about how they were obtained, is crucial for their further use. Contrary to what the overused metaphors of ‘data mining’ and ‘big data’ are implying, it is hardly possible to use data in a meaningful way if information about sources and types of conversions is discarded in the process of data gathering. A generative model of spatiotemporal information could not only help automating the description of derivation processes but also assessing the scope of a dataset’s future use by exploring possible transformations. Even though there are technical approaches to document data provenance, models for describing how spatiotemporal data are generated are still missing. To fill this gap, we introduce an algebra that models data generation and describes how datasets are derived, in terms of types of reference systems. We illustrate its versatility by applying it to a number of derivation scenarios, ranging from field aggregation to trajectory generation, and discuss its potential for retrieval, analysis support systems, as well as for assessing the space of meaningful computations.
The appropriateness of spatial prediction methods such as Kriging, or aggregation methods such as summing observation values over an area, is currently judged by domain experts using their knowledge and expertise. In order to provide support from information systems for automatically discouraging or proposing prediction or aggregation methods for a dataset, expert knowledge needs to be formalized. This involves, in particular, knowledge about phenomena represented by data and models, as well as about underlying procedures. In this paper, we introduce a novel notion of meaningfulness of prediction and aggregation. To this end, we present a formal theory about spatio-temporal variable types, observation procedures, as well as interpolation and aggregation procedures relevant in Spatial Statistics. Meaningfulness is defined as correspondence between functions and data sets, the former representing data generation procedures such as observation and prediction. Comparison is based on semantic reference systems, which are types of potential outputs of a procedure. The theory is implemented in higher-order logic (HOL), and theorems about meaningfulness are proved in the semi-automated prover Isabelle. The type system of our theory is available as a Web Ontology Language (OWL) pattern for use in the Semantic Web. In addition, we show how to implement a data-model recommender system in the statistics tool environment R. We consider our theory groundwork to automate semantic interoperability of data and models.