Conference PaperPDF Available

Topological and scale-related issues in Twitter analyses through superimposed forms of spatial heterogeneity

Authors:

Abstract

Goodchild's notion of leveraging 'humans-as-sensors' raised considerable interest in analysing social media feeds like Twitter. However, from a spatial-methodological point of view, some issues yet remain largely unaddressed. Some of these issues are related to the uncoordinated acquisition process that is associated with Twitter feeds, which brings in a new form of heterogeneity. Users unconsciously encode their habits and spatial perception abilities when tweeting. That, in turn, leads to spatially overlapping and commingled regimes, which differ from traditional spatially exclusive forms of spatial heterogeneity. This contribution discusses recent results (Westerholt et al. 2015, Westerholt et al. 2016) regarding the effects that the abovementioned form of heterogeneity has on spatial autocorrelation, the underlying characteristic driving spatial and spatiotemporal patterning. The results affect the assessment of spatial hotspots and correlation-like structures, two major categories of spatial analysis that conceptually stand for a range of other methodologies. The findings show an increased risk of propagating small-scale effects to larger investigated scales, type I errors, an intrusion of artificial disturbing spatial processes, a reduction of the power of statistical tests and increasingly chaotic and unpredictable behaviour of spatial methods as overlapping effects become more different. In addition, the contribution will, in a more prospective manner, discuss the potential analysis of an oftentimes unstudied aspect of social media: the so called "noise." References Westerholt, R., Resch, B., & Zipf, A. (2015). A local scale-sensitive indicator of spatial autocorrelation for assessing high-and low-value clusters in multiscale datasets.
Topological and scale-related issues in
Twitter analyses through superimposed
forms of spatial heterogeneity
Goodchild’s notion of leveraging ‘humans-as-sensors’ raised considerable interest in analysing social
media feeds like Twitter. However, from a spatial-methodological point of view, some issues yet remain
largely unaddressed. Some of these issues are related to the uncoordinated acquisition process that is
associated with Twitter feeds, which brings in a new form of heterogeneity. Users unconsciously encode
their habits and spatial perception abilities when tweeting. That, in turn, leads to spatially overlapping
and commingled regimes, which differ from traditional spatially exclusive forms of spatial heterogeneity.
This contribution discusses recent results (Westerholt et al. 2015, Westerholt et al. 2016) regarding the
effects that the abovementioned form of heterogeneity has on spatial autocorrelation, the underlying
characteristic driving spatial and spatiotemporal patterning. The results affect the assessment of spatial
hotspots and correlation-like structures, two major categories of spatial analysis that conceptually stand
for a range of other methodologies. The findings show an increased risk of propagating small-scale
effects to larger investigated scales, type I errors, an intrusion of artificial disturbing spatial processes, a
reduction of the power of statistical tests and increasingly chaotic and unpredictable behaviour of spatial
methods as overlapping effects become more different. In addition, the contribution will, in a more
prospective manner, discuss the potential analysis of an oftentimes unstudied aspect of social media: the
so called “noise.”
References
Westerholt, R., Resch, B., & Zipf, A. (2015). A local scale-sensitive indicator of spatial autocorrelation for
assessing high-and low-value clusters in multiscale datasets. International Journal of
Geographical Information Science, 29 (5), 868-887. doi: 10.1080/13658816.2014.1002499.
Westerholt, R., Steiger, E., Resch, B., & Zipf, A. (2016). Abundant Topological Outliers in Social Media
Data and Their Effect on Spatial Analysis. PLOS ONE, 11 (9), e0162360. doi:
10.1371/journal.pone.0162360.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis , since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological out-liers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially.
Article
Full-text available
Georeferenced user-generated datasets like those extracted from Twitter are increasingly gaining the interest of spatial analysts. Such datasets oftentimes reflect a wide array of real-world phenomena. However, each of these phenomena takes place at a certain spatial scale. Therefore, user-generated datasets are of multi-scale nature. Such datasets cannot be properly dealt with using the most common analysis methods, because these are typically designed for single-scale datasets where all observations are expected to reflect one single phenomenon (e.g., crime incidents). In this paper, we focus on the popular local G statistics. We propose a modified scale-sensitive version of a local G statistic. Furthermore, our approach comprises an alternative neighborhood definition that is enables to extract certain scales of interest. We compared our method with the original one on a real-world Twitter dataset. Our experiments show that our approach is able to better detect spatial autocorrelation at specific scales, as opposed to the original method. Based on the findings of our research, we identified a number of scale-related issues that our approach is able to overcome. Thus, we demonstrate the multi-scale suitability of the proposed solution.