Article

Abstraction and cartographic generalization of geographic user-generated content: use-case motivated investigations for mobile users

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

On a daily basis, a conventional internet user queries different internet services (available on different platforms) to gather information and make decisions. In most cases, knowingly or not, this user consumes data that has been generated by other internet users about his/her topic of interest (e.g. an ideal holiday destination with a family traveling by a van for 10 days). Commercial service providers, such as search engines, travel booking websites, video-on-demand providers, food takeaway mobile apps and the like, have found it useful to rely on the data provided by other users who have commonalities with the querying user. Examples of commonalities are demography, location, interests, internet address, etc. This process has been in practice for more than a decade and helps the service providers to tailor their results based on the collective experience of the contributors. There has been also interest in the different research communities (including GIScience) to analyze and understand the data generated by internet users. The research focus of this thesis is on finding answers for real-world problems in which a user interacts with geographic information. The interactions can be in the form of exploration, querying, zooming and panning, to name but a few. We have aimed our research at investigating the potential of using geographic user-generated content to provide new ways of preparing and visualizing these data. Based on different scenarios that fulfill user needs, we have investigated the potential of finding new visual methods relevant to each scenario. The methods proposed are mainly based on pre-processing and analyzing data that has been offered by data providers (both commercial and non-profit organizations). But in all cases, the contribution of the data was done by ordinary internet users in an active way (compared to passive data collections done by sensors). The main contributions of this thesis are the proposals for new ways of abstracting geographic information based on user-generated content contributions. Addressing different use-case scenarios and based on different input parameters, data granularities and evidently geographic scales, we have provided proposals for contemporary users (with a focus on the users of location-based services, or LBS). The findings are based on different methods such as semantic analysis, density analysis and data enrichment. In the case of realization of the findings of this dissertation, LBS users will benefit from the findings by being able to explore large amounts of geographic information in more abstract and aggregated ways and get their results based on the contributions of other users. The research outcomes can be classified in the intersection between cartography, LBS and GIScience. Based on our first use case we have proposed the inclusion of an extended semantic measure directly in the classic map generalization process. In our second use case we have focused on simplifying geographic data depiction by reducing the amount of information using a density-triggered method. And finally, the third use case was focused on summarizing and visually representing relatively large amounts of information by depicting geographic objects matched to the salient topics emerged from the data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Several geospatial studies and applications require comprehensive semantic information from points-of-interest (POIs). However, this information is frequently dispersed across different collaborative mapping platforms. Surprisingly, there is still a research gap on the conflation of POIs from this type of geo-dataset. In this paper, we focus on the matching aspect of POI data conflation by proposing two matching strategies based on a graph whose nodes represent POIs and edges represent matching possibilities. We demonstrate how the graph is used for (1) dynamically defining the weights of the different POI similarity measures we consider; (2) tackling the issue that POIs should be left unmatched when they do not have a corresponding POI on the other dataset and (3) detecting multiple POIs from the same place in the same dataset and jointly matching these to the corresponding POI(s) from the other dataset. The strategies we propose do not require the collection of training samples or extensive parameter tuning. They were statistically compared with a “naive”, though commonly applied, matching approach considering POIs collected from OpenStreetMap and Foursquare from the city of London (England). In our experiments, we sequentially included each of our methodological suggestions in the matching procedure and each of them led to an increase in the accuracy in comparison to the previous results. Our best matching result achieved an overall accuracy of 91%, which is more than 10% higher than the accuracy achieved by the baseline method.
Article
Full-text available
In his classic book “The Image of the City” Kevin Lynch used empirical work to show how different elements of the city were perceived: such as paths, landmarks, districts, edges, and nodes. Streets, by providing paths from which cities can be experienced, were argued to be one of the key elements of cities. Despite this long standing empirical basis, and the importance of Lynch's model in policy associated areas such as planning, work with user generated content has largely ignored these ideas. In this paper, we address this gap, using streets to aggregate filtered user generated content related to more than 1 million images and 60,000 individuals and explore similarity between more than 3000 streets in London across three dimensions: user behaviour, time and semantics. To perform our study we used two different sources of user generated content: (1) a collection of metadata attached to Flickr images and (2) street network of London from OpenStreetMap. We first explore global patterns in the distinctiveness and spatial autocorrelation of similarity using our three dimensions, establishing that the semantic and user dimensions in particular allow us to explore the city in different ways. We then used a Processing tool to interactively explore individual patterns of similarity across these four dimensions simultaneously, presenting results here for four selected and contrasting locations in London. Before drilling into the data to interpret in more detail, the identified patterns demonstrate that streets are natural units capturing perception of cities not only as paths but also through the emergence of other elements of the city proposed by Lynch including districts, landmarks and edges. Our approach also demonstrates how user generated content can be captured, allowing bottom-up perception from citizens to flow into a representation.
Article
Full-text available
Geospatial information plays an indispensable role in various interdisciplinary and spatially informed analyses. However, the use of geospatial information often entails many semantic intricacies relating to, among other issues, data integration and visualization. For the integration of data from different domains, merely using ontologies is inadequate for handling subtle and complex semantic relations raised by the multiple representations of geospatial data, as the domains have different conceptual views for modelling the geographic space. In addition, for geospatial data visualization—one of the most predominant ways of utilizing geospatial information—semantic intricacies arise as the visualization knowledge is difficult to interpret and utilize by non-geospatial experts. In this paper, we propose a knowledge-based approach using semantic technologies (coupling ontologies, semantic constraints, and semantic rules) to facilitate geospatial data integration and visualization. A traffic spatially informed study is developed as a case study: visualizing urban bicycling suitability. In the case study, we complement ontologies with semantic constraints for cross-domain data integration. In addition, we utilize ontologies and semantic rules to formalize geospatial data analysis and visualization knowledge at different abstraction levels, which enables machines to infer visualization means for geospatial data. The results demonstrate that the proposed framework can effectively handle subtle cross-domain semantic relations for data integration, and empower machines to derive satisfactory visualization results. The approach can facilitate the sharing and outreach of geospatial data and knowledge for various spatially informed studies.
Article
Full-text available
User-generated map data is increasingly used by the technology industry for background mapping, navigation and beyond. An example is the integration of OpenStreetMap (OSM) data in widely-used smartphone and web applications, such as Pokémon GO (PGO), a popular augmented reality smartphone game. As a result of OSM's increased popularity, the worldwide audience that uses OSM through external applications is directly exposed to malicious edits which represent cartographic vandalism. Multiple reports of obscene and anti-semitic vandalism in OSM have surfaced in popular media over the years. These negative news related to cartographic vandalism undermine the credibility of collaboratively generated maps. Similarly, commercial map providers (e.g., Google Maps and Waze) are also prone to carto-vandalism through their crowdsourcing mechanism that they may use to keep their map products up-to-date. Using PGO as an example, this research analyzes harmful edits in OSM that originate from PGO players. More specifically, this paper analyzes the spatial, temporal and semantic characteristics of PGO carto-vandalism and discusses how the mapping community handles it. Our findings indicate that most harmful edits are quickly discovered and that the community becomes faster at detecting and fixing these harmful edits over time. Gaming related carto-vandalism in OSM was found to be a short-term, sporadic activity by individuals, whereas the task of fixing vandalism is persistently pursued by a dedicated user group within the OSM community. The characteristics of carto-vandalism identified in this research can be used to improve vandalism detection systems in the future.
Article
Full-text available
More than 10 years have passed since the coining of the term volunteered geographic information (VGI) in 2007. This article presents the results of a review of the literature concerning VGI. A total of 346 articles published in 24 international refereed journals in GIScience between 2007 and 2017 have been reviewed. The review has uncovered varying levels of popularity of VGI research over space and time, and varying interests in various sources of VGI (e.g. OpenStreetMap) and VGI-related terms (e.g. user-generated content) that point to the multi-perspective nature of VGI. Content-wise, using latent Dirichlet allocation (LDA), this study has extracted 50 specific research topics pertinent to VGI. The 50 topics have been subsequently clustered into 13 intermediate topics and three overarching themes to allow a hierarchical topic review. The overarching VGI research themes include (1) VGI contributions and contributors, (2) main fields applying VGI, and (3) conceptions and envisions. The review of the articles under the three themes has revealed the progress and the points that demand attention regarding the individual topics. This article also discusses the areas that the existing research has not yet adequately explored and proposes an agenda for potential future research endeavors.
Article
Full-text available
The Space-Scale Cube (SSC) model stores the result of a generalization process, that supports smooth scale transitions for map objects. The third dimension is used to describe geometrically the smooth transitions between objects at different levels of detail. Often-used map generalization operators fit in this SSC model. The 3D SSC model to derive 2D maps can be used in a mobile web client, where these days powerful graphics hardware is available. This article shows the steps needed for producing and disseminating SSC data with smooth transitions over the web. Firstly, we explain how SSC data can be obtained and subsequently be rendered by making effective use of the GPU. Secondly, we show how we organize data in chunks and how this ‘chunked’ data can be used for efficient communication between client and server. In the third place, we describe which operations can be used on the client side for deriving maps. Fourthly, the SSC also allows for (a) mixed abstraction slicing surfaces useful for highlighting specific regions by showing more detail and (b) near-intersection blending, which helps to prevent abrupt transitions while the slicing surface is in motion. Finally, we show how animated pan and zoom functionalities may be realized. A set of prototypes allows us to disseminate the data with smooth transitions on the web and in practice judge the effect of continuous generalization and animating the map image.
Conference Paper
Full-text available
This paper presents a study on how natural language words that designate types of spatial entities (metropolis, city, creek, etc.) can automatically be translated to the entity classification used in OpenStreetMap (OSM) that assigns key-value tags to entities. The problem of identifying key-value pairs for querying OSM occurs in geographic information retrieval based on natural language text and is difficult for three reasons: Conceptualisation of entities in natural language text and in OSM often differs. Even classification of a single entity type is subject to variations throughout the OSM database. Language is rich and offers many words to communicate nuances of a single entity type. The contribution of this paper is to analyse the contribution of semantic word similarity using Word-Net to identify a mapping from natural language to OSM tags. We present a strategy to identify key-value pairs for natural language words using WordNet and analyse its effectiveness.
Article
Full-text available
Online representations of places are becoming pivotal in informing our understanding of urban life. Content production on online platforms is grounded in the geography of their users and their digital infrastructure. These constraints shape place representation, that is the amount, quality, and type of digital information available in a geographic area. In this article, we study the place representation of user-generated content (UGC) in Los Angeles County, relating the spatial distribution of the data to its geo-demographic context. Adopting a comparative and multiplatform approach, this quantitative analysis investigates the spatial relationship between four diverse UGC datasets and their context at the census tract level (about 685,000 geo-located tweets, 9,700 Wikipedia pages, 4M OSM objects, and 180,000 Foursquare venues). The context includes the ethnicity, age, income, education, and deprivation of residents, as well as public infrastructure. An exploratory spatial analysis and regression-based models indicate that the four UGC platforms possess distinct geographies of place representation. To a moderate extent, the presence of Twitter, OpenStreetMap, and Foursquare data is influenced by population density, ethnicity, education, and income. However, each platform responds to different socio-economic factors and clusters emerge in disparate hotspots. Unexpectedly, Twitter data tends to be located in more dense, deprived areas, and the geography of Wikipedia appears peculiar and harder to explain. These trends are compared with previous findings for the area of Greater London.
Article
Full-text available
With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results.
Article
Full-text available
The tremendous advance in information technology has promoted the rapid development of location-based services (LBSs), which play an indispensable role in people’s daily lives. Compared with a traditional LBS based on Point-Of-Interest (POI), which is an isolated location point, an increasing number of demands have concentrated on Region-Of-Interest (ROI) exploration, i.e., geographic regions that contain many POIs and express rich environmental information. The intention behind the POI is to search the geographical regions related to the user’s requirements, which contain some spatial objects, such as POIs and have certain environmental characteristics. In order to achieve effective ROI exploration, we propose an ROI top-k keyword query method that considers the environmental information of the regions. Specifically, the Word2Vec model has been introduced to achieve the distributed representation of POIs and capture their environmental semantics, which are then leveraged to describe the environmental characteristic information of the candidate ROI. Given a keyword query, different query patterns are designed to measure the similarities between the query keyword and the candidate ROIs to find the k candidate ROIs that are most relevant to the query. In the verification step, an evaluation criterion has been developed to test the effectiveness of the distributed representations of POIs. Finally, after generating the POI vectors in high quality, we validated the performance of the proposed ROI top-k query on a large-scale real-life dataset where the experimental results demonstrated the effectiveness of our proposals.
Article
Full-text available
Geovisualisation is a knowledge-intensive art in which both providers and users need to possess a wide range of knowledge. Current syntactic approaches to presenting visualisation information lack semantics on the one hand, and on the other hand are too bespoke. Such limitations impede the transfer, interpretation, and reuse of the geovisualisation knowledge. In this paper, we propose a knowledge-based approach to formally represent geovisualisation knowledge in a semantically-enriched and machine-readable manner using Semantic Web technologies. Specifically, we represent knowledge regarding cartographic scale, data portrayal and geometry source, which are three key aspects of geovisualisation in the contemporary web mapping era, coupling ontologies and semantic rules. The knowledge base enables inference for deriving the corresponding geometries and portrayals for visualisation under different conditions. A prototype system is developed in which geospatial linked data are used as underlying data, and some geovisualisation knowledge is formalised into a knowledge base to visualise the data and provide rich semantics to users. The proposed approach can partially form the foundation for the vision of web of knowledge for geovisualisation.
Article
Full-text available
We are now living in a mobile information era, which is fundamentally changing science and society. Location Based Services (LBS), which deliver information depending on the location of the (mobile) device and user, play a key role in this mobile information era. This article first reviews the ongoing evolution and research trends of the scientific field of LBS in the past years. To motivate further LBS research and stimulate collective efforts, this article then presents a series of key research challenges that are essential to advance the development of LBS, setting a research agenda for LBS to ‘positively’ shape the future of our mobile information society. These research challenges cover issues related to the core of LBS development (e.g. positioning, modelling, and communication), evaluation, and analysis of LBS-generated data, as well as social, ethical, and behavioural issues that rise as LBS enter into people’s daily lives.
Article
Full-text available
Predictive hotspot mapping plays a critical role in hotspot policing. Existing methods such as the popular kernel density estimation (KDE) do not consider the temporal dimension of crime. Building upon recent works in related fields, this article proposes a spatio-temporal framework for predictive hotspot mapping and evaluation. Comparing to existing work in this scope, the proposed framework has four major features: (1) a spatio-temporal kernel density estimation (STKDE) method is applied to include the temporal component in predictive hotspot mapping, (2) a data-driven optimization technique, the likelihood cross-validation, is used to select the most appropriate bandwidths, (3) a statistical significance test is designed to filter out false positives in the density estimates, and (4) a new metric, the predictive accuracy index (PAI) curve, is proposed to evaluate predictive hotspots at multiple areal scales. The framework is illustrated in a case study of residential burglaries in Baton Rouge, Louisiana in 2011, and the results validate its utility.
Article
Full-text available
Social Media is a well-known platform for users to create, share and check the new information. The world becomes a global village because of the utilization of internet and social media. The data present on Twitter contains information of great importance. There is a strong need to extract valuable information from this huge amount of data. A key research challenge in this area is to analyze and process this huge data and detect the signals or spikes. Existing work includes sentiment analysis for Twitter, hashtag analysis, and event detection but spikes/signal detection from Twitter remains an open research area. From this line of research, we propose a signal detection approach using sentiment analysis from Twitter data (tweets volume, top hashtag and sentiment analysis). In this paper, we propose three algorithms for signal detection in tweets volume, tweets sentiment and top hashtag. The algorithms are the- Average moving threshold algorithm, Gaussian algorithm, and hybrid algorithm. The hybrid algorithm is a combination of the average moving threshold algorithm and Gaussian algorithm. The proposed algorithms are tested over real-time data extracted from Twitter and two large publically available datasets- Saudi Aramco dataset and BP America dataset. Experimental results show that hybrid algorithm outperforms the Gaussian and average moving threshold algorithm and achieve a precision of 89% on real-time tweets data, 88% on Saudi Aramco dataset and 81% on BP America dataset with the recall of 100%.
Article
Full-text available
Twitter has been in the forefront of political discourse, with politicians choosing it as their platform for disseminating information to their constituents. We seek to explore the effectiveness of social media as a resource for both polling and predicting the election outcome. To this aim, we create a dataset consisting of approximately 3 million tweets ranging from September 22nd to November 8th, 2016. Polling analysis will be performed on two levels: national and state. Predicting the election is performed only at the state level due to the electoral college process present in the U.S. election system. Two approaches are used for predicting the election, a winner-take-all approach and shared elector count approach. Twenty-one states are chosen, eleven categorized as swing states, and ten as heavily favored states. Two metrics are incorporated for polling and predicting the election outcome: tweet volume per candidate and positive sentiment per candidate. Our approach shows when polling on the national level, aggregated sentiment across the election time period provides values close to the polls. At the state level, volume is not a good candidate for polling state votes. Sentiment produces values closer to swing state polls when the election is close.
Article
Full-text available
User-Generated Content (UGC) provides a potential data source which can help us to better describe and understand how places are conceptualized, and in turn better represent the places in Geographic Information Science (GIScience). In this article, we aim at aggregating the shared meanings associated with places and linking these to a conceptual model of place. Our focus is on the metadata of Flickr images, in the form of locations and tags. We use topic modeling to identify regions associated with shared meanings. We choose a grid approach and generate topics associated with one or more cells using Latent Dirichlet Allocation. We analyze the sensitivity of our results to both grid resolution and the chosen number of topics using a range of measures including corpus distance and the coherence value. Using a resolution of 500 m and with 40 topics, we are able to generate meaningful topics which characterize places in London based on 954 unique tags associated with around 300,000 images and more than 7000 individuals.
Article
Full-text available
Over the last two decades, online communities have become ubiquitous, with millions of people accessing collaborative project websites every day. Among them, the OpenStreetMap project (OSM) has been very successful in collecting/offering volunteered geographic information (VGI). Very different behaviours are observed among OSM participants, which translate into large differences of lifespan, contribution levels (e.g. Nielsen’s 90–9-1 rule) and attitudes towards innovations (e.g. Diffusion of innovation theory or DoIT). So far, the literature has defined phases in the life cycle of contributors only based on the nature of their contributions (e.g. role of participants and edits characteristics). Our study identifies the different phases of their life cycle from a temporal perspective and assesses how these phases relate to the volume and the frequency of the contributions from participants. Survival analyses were performed using both a complementary cumulative distribution function and a Kaplan-Meier estimator to plot survival and hazard curves. The analyses were broken down according to Nielsen and DoIT contributors’ categories to highlight potential explanatory variables. This paper shows that two contribution processes combine with three major participation stages to form six phases in contributors’ life cycle. The volume of edits provided on each active day is driven by the two contribution processes, illustrating the evolution of contributors’ motivation over time. Since contributors’ lifespan is a universal metric, our results may also apply to other collaborative online communities.
Chapter
Full-text available
The role of citizens in mapping has evolved considerably over the last decade. This chapter outlines the background to citizen sensing in mapping and sets the scene for the chapters that follow, which highlight some of the main outcomes of a collaborative programme of work to enhance the role of citizens in mapping.
Article
Full-text available
A critical problem in mapping data is the frequent updating of large data sets. To solve this problem, the updating of small-scale data based on large-scale data is very effective. Various map generalization techniques, such as simplification, displacement, typification, elimination, and aggregation, must therefore be applied. In this study, we focused on the elimination and aggregation of the building layer, for which each building in a large scale was classified as “0-eliminated,” “1-retained,” or “2-aggregated.” Machine-learning classification algorithms were then used for classifying the buildings. The data of 1:1000 scale and 1:25,000 scale digital maps obtained from the National Geographic Information Institute were used. We applied to these data various machine-learning classification algorithms, including naive Bayes (NB), decision tree (DT), k-nearest neighbor (k-NN), and support vector machine (SVM). The overall accuracies of each algorithm were satisfactory: DT, 88.96%; k-NN, 88.27%; SVM, 87.57%; and NB, 79.50%. Although elimination is a direct part of the proposed process, generalization operations, such as simplification and aggregation of polygons, must still be performed for buildings classified as retained and aggregated. Thus, these algorithms can be used for building classification and can serve as preparatory steps for building generalization.
Conference Paper
Full-text available
To enable smooth zooming, we propose a method to continuously generalize buildings from a given start map to a smaller-scale goal map, where there are only built-up area polygons instead of individual building polygons. We name the buildings on the start map original buildings. For an intermediate scale, we aggregate the original buildings that will become too close by adding bridges. We grow (bridged) original buildings based on buffering, and simplify the grown buildings. We take into account the shapes of the buildings both at the previous map and goal map to make sure that the buildings are always growing. The running time of our method is in O (n3), where n is the number of edges of all the original buildings. The advantages of our method are as follows. First, the buildings grow continuously and, at the same time, are simplified. Second, right angles of buildings are preserved during growing: the merged buildings still look like buildings. Third, the distances between buildings are always larger than a specified threshold. We do a case study to show the performances of our method.
Chapter
Full-text available
The most common way to use geographic information is to make maps. With the ever growing amount of Volunteered Geographic Information (VGI), we have the opportunity to make many maps, but only automatic cartography (generalisation, stylisation, text placement) can handle such an amount of data with very frequent updates. This chapter reviews the recent proposals to adapt the current techniques for automatic cartography to VGI as the source data, focusing on the production of topographic base maps. The review includes methods to assess quality and the level of detail, which is necessary to handle data heterogeneity. The paper also describes automatic techniques to general-ise, harmonise and render VGI.
Article
Full-text available
With the development of social media (e.g. Twitter, Flickr, Foursquare, Sina Weibo, etc.), a large number of people are now using them and post microblogs, messages and multi-media information. The everyday usage of social media results in big open social media data. The data offer fruitful information and reflect social behaviors of people. There is much visualization and visual analytics research on such data. We collect state-of-the-art research and put it into three main categories: social network, spatial temporal information and text analysis. We further summarize the visual analytics pipeline for the social media, combining the above categories and supporting complex tasks. With these techniques, social media analytics can apply to multiple disciplines. We summarize the applications and public tools to further investigate the challenges and trends.
Article
Full-text available
Data about points of interest (POI) have been widely used in studying urban land use types and for sensing human behavior. However, it is difficult to quantify the correct mix or the spatial relations among different POI types indicative of specific urban functions. In this research, we develop a statistical framework to help discover semantically meaningful topics and functional regions based on the co-occurrence patterns of POI types. The framework applies the latent Dirichlet allocation (LDA) topic modeling technique and incorporates user check-in activities on location-based social networks. Using a large corpus of about 100,000 Foursquare venues and user check-in behavior in the 10 most populated urban areas of the US, we demonstrate the effectiveness of our proposed methodology by identifying distinctive types of latent topics and, further, by extracting urban functional regions using K-means clustering and Delaunay triangulation spatial constraints clustering. We show that a region can support multiple functions but with different probabilities, while the same type of functional region can span multiple geographically non-adjacent locations. Since each region can be modeled as a vector consisting of multinomial topic distributions, similar regions with regard to their thematic topic signatures can be identified. Compared with remote sensing images which mainly uncover the physical landscape of urban environments, our popularity-based POI topic modeling approach can be seen as a complementary social sensing view on urban space based on human activities.
Article
Full-text available
Variable-scale maps have been advocated by several authors in the context of mobile cartography. In the literature on real-time map generalisation, however, corresponding methods that resolve cartographic conflicts by deformation of the underlying map space together with the map foreground, are underrepresented. This paper demonstrates how the concept of a malleable space can be applied as a part of the generalisation process and incorporated into the overall methodology of point generalisation. Two different algorithms are used, a density-equalising cartogram algorithm and Laplacian smoothing. Both methods work in real-time and are datadriven. In addition, they allow for a parameterisation in combi-nation with a quadtree data structure, as well as a combination with 'classic' generalisation operators (e.g. selection, aggregation, displacement) based on the quadtree. The quadtree serves both as a spatial index for fast retrieval and search of points, and as a density estimator to inform generalisation operators. The use of the quadtree as a common spatial index provides a tool to combine variable-scale maps with classic generalisation. A combination of the two allows, at small map scales, the maintenance of detail in dense areas and data reduction in sparse areas. Additionally, it facilitates building a modular workflow for real-time map generalisation.
Article
Full-text available
Cognitive regions and places are notoriously difficult to represent in geographic information science and systems. The exact delineation of cognitive regions is challenging insofar as borders are vague, membership within the regions varies non-monotonically, and raters cannot be assumed to assess membership consistently and homogeneously. In a study published in this journal in 2014, researchers devised a novel grid-based task in which participants rated the membership of individual cells in a given region and contrasted this approach to a standard boundary-drawing task. Specifically, the authors assessed the vague cognitive regions of Northern California and Southern California. The boundary between these cognitive regions was found to have variable width, and region membership peaked not at the most northern or southern cells but at substantially less extreme latitudes. The authors thus concluded that region membership is about attitude, not just latitude. In the present work, we reproduce this study by approaching it from a computational fourth-paradigm perspective, i.e., by the synthesis of high volumes of heterogeneous data from various sources. We compare the regions which we identify to those from the human-participants study of 2014, identifying differences and commonalities. Our results show a significant positive correlation to those in the original study. Beyond the extracted regions themselves, we compare and contrast the empirical and analytical approaches of these two methods, one a conventional human-participants study and the other an application of increasingly popular data-synthesis-driven research methods in GIScience.
Article
Full-text available
In this paper, we investigate whether microblogging texts (tweets) produced on mobile devices are related to the geographical locations where they were posted. For this purpose, we correlate tweet topics to areas. In doing so, classified points of interest from OpenStreetMap serve as validation points. We adopted the classification and geolocation of these points to correlate with tweet content by means of manual, supervised, and unsupervised machine learning approaches. Evaluation showed the manual classification approach to be highest quality, followed by the supervised method, and that the unsupervised classification was of low quality. We found that the degree to which tweet content is related to nearby points of interest depends upon topic (that is, upon the OpenStreetMap category). A more general synthesis with prior research leads to the conclusion that the strength of the relationship of tweets and their geographic origin also depends upon geographic scale (where smaller scale correlations are more significant than those of larger scale).
Article
Full-text available
Ubiquitous nature of online social media and ever expending usage of short text messages becomes a potential source of crowd wisdom extraction especially in terms of sentiments therefore sentiment classification and analysis is a significant task of current research purview. Major challenge in this area is to tame the data in terms of noise, relevance, emoticons, folksonomies and slangs. This works is an effort to see the effect of pre-processing on twitter data for the fortification of sentiment classification especially in terms of slang word. The proposed method of pre-processing relies on the bindings of slang words on other coexisting words to check the significance and sentiment translation of the slang word. We have used n-gram to find the bindings and conditional random fields to check the significance of slang word. Experiments were carried out to observe the effect of proposed method on sentiment classification which clearly indicates the improvements in accuracy of classification.
Article
Full-text available
Until now, road network generalization has mainly been applied to the task of generalizing from one fixed source scale to another fixed target scale. These actions result in large differences in content and representation, e.g., a sudden change of the representation of road segments from areas to lines, which may confuse users. Therefore, we aim at the continuous generalization of a road network for the whole range, from the large scale, where roads are represented as areas, to mid- and small scales, where roads are represented progressively more frequently as lines. As a consequence of this process, there is an intermediate scale range where at the same time some roads will be represented as areas, while others will be represented as lines. We propose a new data model together with a specific data structure where for all map objects, a range of valid map scales is stored. This model is based on the integrated and explicit representation of: (1) a planar area partition; and (2) a linear road network. This enables the generalization process to include the knowledge and understanding of a linear network. This paper further discusses the actual generalization options and algorithms for populating this data structure with high quality vario-scale cartographic content.
Conference Paper
Full-text available
Recently, social media, such as Twitter, has been successfully used as a proxy to gauge the impacts of disasters in real time. However, most previous analyses of social media during disaster response focus on the magnitude and location of social media discussion. In this work, we explore the impact that disasters have on the underlying sentiment of social media streams. During disasters, people may assume negative sentiments discussing lives lost and property damage, other people may assume encouraging responses to inspire and spread hope. Our goal is to explore the underlying trends in positive and negative sentiment with respect to disasters and geographically related sentiment. In this paper, we propose a novel visual analytics framework for sentiment visualization of geo-located Twitter data. The proposed framework consists of two components, sentiment modeling and geographic visualization. In particular, we provide an entropy-based metric to model sentiment contained in social media data. The extracted sentiment is further integrated into a visualization framework to explore the uncertainty of public opinion. We explored Ebola Twitter dataset to show how visual analytics techniques and sentiment modeling can reveal interesting patterns in disaster scenarios.
Article
Despite the substantial economic impact of the restaurant industry, large-scale empirical research on restaurant survival has been sparse. We investigate whether consumer-posted photos can serve as a leading indicator of restaurant survival above and beyond reviews, firm characteristics, competitive landscape, and macroconditions. We employ machine learning techniques to extract features from 755,758 photos and 1,121,069 reviews posted on Yelp between 2004 and 2015 for 17,719 U.S. restaurants. We also collect data on restaurant characteristics (e.g., cuisine type, price level) and competitive landscape as well as entry and exit (if applicable) time from each restaurant’s Yelp/Facebook page, own website, or Google search engine. Using a predictive XGBoost algorithm, we find that consumer-posted photos are strong predictors of restaurant survival. Interestingly, the informativeness of photos (e.g., the proportion of food photos) relates more to restaurant survival than do photographic attributes (e.g., composition, brightness). Additionally, photos carry more predictive power for independent, young or mid-aged, and medium-priced restaurants. Assuming that restaurant owners possess no knowledge about future photos and reviews, photos can predict restaurant survival for up to three years, whereas reviews are only predictive for one year. We further employ causal forests to facilitate the interpretation of our predictive results. Among photo content variables, the proportion of food photos has the largest positive association with restaurant survival, followed by proportions of outside and interior photos. Among others, the proportion of photos with helpful votes also positively relates to restaurant survival. This paper was accepted by Juanjuan Zhang, marketing. Funding: The authors thankNvidia and Clarifai for supporting this research. Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mnsc.2022.4359 .
Article
Geospatial data conflation is aimed at matching counterpart features from two or more data sources in order to combine and better utilize information in the data. Due to the importance of conflation in spatial analysis, different approaches to the conflation problem have been proposed ranging from simple buffer-based methods to probability and optimization based models. In this paper, I propose a formal framework for conflation that integrates two powerful tools of geospatial computation: optimization and relational databases. I discuss the connection between the relational database theory and conflation, and demonstrate how the conflation process can be formulated and carried out in standard relational databases. I also propose a set of new optimization models that can be used inside relational databases to solve the conflation problem. The optimization models are based on the minimum cost circulation problem in operations research (also known as the network flow problem), which generalizes existing optimal conflation models that are primarily based on the assignment problem. Using comparable datasets, computational experiments show that the proposed conflation method is effective and outperforms existing optimal conflation models by a large margin. Given its generality, the new method may be applicable to other data types and conflation problems.
Article
This paper presents a crowd sensing system (CSS) that captures geospatial social media topics and allows the review of results. Using Web-resources derived from social media platforms, the CSS uses a spatially-situated social network graph to harvest user-generated content from selected organizations and members of the public. This allows ‘passively’ contributed social media-based opinions, along with different variables, such as time, location, social interaction, service usage, and human activities to be examined and used to identify trending views and influential citizens. The data model and CSS are used for demonstration purposes to identify geotopics and community interests relevant to municipal affairs in the City of Toronto, Canada.
Article
During the last ten years, a large body of research extracting and analysing geographic data from social media has developed. We analyse 690 papers across 20 social media platforms, focussing particularly on the method used for extraction of location information. We discuss and compare extraction methods, and consider their accuracy and coverage. While much work has adopted location information in the form of coordinates in message metadata, this approach has very limited coverage in most platforms and reports on posting location rather than message location or the location that the message refers to (geofocus). In contrast, a wide array of other approaches have been developed, with methods that extract place names from message text providing the highest accuracy. Methods that use social media connections also provide good results, but all of the methods have limitations. We also present analysis of the range and frequency of use of different social media platforms, and the wide range of application areas that have been addressed. Drawing on this analysis we present a number of future areas of research that warrant attention in order for this field of research to mature.
Article
Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state‐of‐the‐art prediction results. Along with the success of deep learning in many application domains, deep learning is also used in sentiment analysis in recent years. This paper gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis. This article is categorized under: • Fundamental Concepts of Data and Knowledge > Data Concepts • Algorithmic Development > Text Mining
Conference Paper
The evolution of contributor behavior in peer production communities over time has been a subject of substantial interest in the social computing community. In this paper, we extend this literature to the geographic domain, exploring contribution behavior in OpenStreetMap using a spatiotemporal lens. In doing so, we observe a geographic version of a 'born, not made' phenomenon: throughout their lifespans, contributors are relatively consistent in the places and types of places that they edit. We show how these 'born, not made' trends may help explain the urban and socioeconomic coverage biases that have been observed in OpenStreetMap. We also discuss how our findings can help point towards solutions to these biases.
Article
We consider the problem of finding map regions that best match query keywords. This region search problem can be applied in many practical scenarios such as shopping recommendation, searching for tourist attractions, and collision region detection for wireless sensor networks. While conventional map search retrieves isolate locations in a map, users frequently attempt to find regions of interest instead, e.g., detecting regions having too many wireless sensors to avoid collision, or finding shopping areas featuring various merchandise or tourist attractions of different styles. Finding regions of interest in a map is a non-trivial problem and retrieving regions of arbitrary shapes poses particular challenges. In this paper, we present a novel region search algorithm, dense region search (DRS), and its extensions, to find regions of interest by estimating the density of locations containing the query keywords in the region. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of our algorithm. © 2017, Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature.
Chapter
Integrating contextual information into the process of location-based service delivering is an emerging trend towards more advanced techniques aiming at personalization and intelligence of location-based services in the big data era. This chapter provides a systematic review of current context-aware location-based service systems using big data by analysing the methodological and practical choices that their developers made during the main phases of the context awareness process (i.e. context acquisition, context representation, and context reasoning and adaptation). Specifically, the chapter analyses ten location-based services, developed over the five years 2010–2014, by focusing on (1) context categories, data sources and level of automation of the context acquisition, (2) context models applied for context representation, and (3) adaptation strategies and reasoning methodologies used for context reasoning and adaptation. For each of these steps, a set of research questions and evaluation criteria are extracted that we use to evaluate and compare the surveyed context-aware location-based services. The results of this comparison are used to outline challenges and opportunities for future research in this research field.
Conference Paper
One of the main research trends over the last years has focused on knowledge extraction from social networks users. One of the main difficulties of this analysis is the lack of structure of the information and the multiple formats in which it can appear. The present article focuses on the analysis of the information provided by different users in image form. The problem that is intended to be solved is the detection of equal images (although they may have minimal transformations, such as a watermark), which allows establishing links between users who publish the same images. The solution proposed in the article is based on the comparison of hashes, which allows certain transformations that can be made to an image from a computational point of view.
Article
Die Generalisierung von Karten ist ein Kernprozess in der Kartographie und beschreibt wie räumliche Phänomene abstrahiert und lesbar auf Karten dargestellt werden. Die Anforderungen an die Generalisierung sind dabei abhängig vom gewählten Darstellungsmedium und dem Kontext, in dem die Karten angewandt werden. Für webbasierte und mobile Anwendungen gelten hohe Anforderungen an eine Generalisierung, die in Echtzeit zu erfolgen hat. Für Nutzerinnen und Nutzer von online und mobilen Anwendungen ist zudem wichtig, dass der Karteninhalt dabei flexibel angepasst werden kann (Modularität). Durch das Aufkommen neuer Kartentypen, die es erlauben, Inhalte dynamisch aus mehreren Datenquellen zu sogenannten „mash-up“ Karten zusammenzustellen, haben Punktdaten an Wichtigkeit gewonnen. Punktdaten stellen die grundlegendste, aber zugleich auch am stärksten abstrahierte Form räumlicher Daten dar. Obwohl Punktdaten zunehmend auf Karten dargestellt werden, sind Lösungen zu deren Echtzeitgeneraliserung bis heute noch wenig erforscht. Das Hauptaugenmerk der vorliegenden Arbeit ist deswegen die Analyse und Entwick- lung von automatisierten Algorithmen, die in Echtzeit Punktdaten kartographisch generalisieren. Auf die spezifischen Bedürfnisse für web-basierte und mobile Karten und deren Integration in dynamische Darstellung wird dabei besonderen Wert gelegt. An dieser Arbeit ist speziell hervorzuheben, dass alle darin vorgeschlagenen Verfahren auf dem Quadtree beruhen, einer hierarchischen, räumlichen Datenstruktur. Obwohl der Quadtree für die kartographische Generalisierung interessante Eigenschaften aufweist, wurde diese Datenstruktur allerdings erst wenig für die Generalisierung verwendet. Der Hauptbeitrag dieser Arbeit ist sowohl die Entwicklung einer umfassenden Methodologie, als auch eines auf dem Quadtree basierenden, modularen Systems von Operatoren und Algorithmen für die Echtzeit-Generalisierung von Punktdaten. Der Quadtree, besser gesagt der Point Region (PR) Quadtree, wird als hierarchischer, räumlicher Index und Hilfsdatenstruktur verwendet. Der PR Quadtree ermöglicht sowohl Echtzeitverhalten als auch Modularität der Algorithmen. Zusätzlich liefert er unterstützende Informationen, welche die Generalisierungsalgorithmen und -regeln unterstützen. Die Verwendung des Quadtrees erlaubt, die Dichotomie bisher aus der Literatur bekannten Ansätze zur Generalisierung in Echtzeit zu überwinden. Bisherige Ansätze zur Echtzeitgeneralisierung können in zwei Hauptansätze unterteilt werden. Dies sind einfache Generalisierungsalgorithmen mit hoher Flexibilität in der Prozessgestaltung, aber limitiert in der kartographischen Qualität und Lösungen basierend auf im Voraus berechneten, hierarchischen Datenstrukturen, von hoher kartographischer Qualität, die aber in der Flexibilität eingeschränkt sind. Folgende Forschungsfragen waren für die Arbeit leitgebend: Welches sind notwendige Eigenschaften für eine effiziente und flexible Echtzeitgeneralisierung von Punktdaten, welche die Anforderungen von Web- und Mobilkarten erfüllt? Wie können verschiedene Arten von Operatoren zur Punktgeneralisierung in einen modularen Arbeitsablauf integriert werden? Wie kann die Interaktion mit der Kartengeneralisierung erweitert werden, um eine dynamische Erkundung von Information in Webkarten zu ermöglichen? Und schliesslich, welches sind Stärken und Schwächen von auf dem Quadtree basierenden Algorithmen zur Echtzeitgeneralisierung, und wie verhalten sie sich? Als Erstes erfolgt die Auswahl und Kategorisierung von Methoden zur Echtzeitgeneralisierung, die Entwicklung einer umfassenden Methodologie, als auch eines problemorientierten Arbeitprozesses der Generalisierung. Die Methodologie unterscheidet zwischen zwei Hauptprinzipien in der Punktgeneralisierung: Einerseits die objekt-gerichteten Generalisierungsoperatoren, welche die Kartenobjekte direkt manipulieren und andererseits die raum-gerichtete Operatoren, die den Kartenraum deformieren, um Generalisierungsoperationen durchzuführen. Der in der Arbeit entwickelte Prototyp implementiert sowohl Lösungen für objekt- und für raumgerichtete Generalisierungsoperatoren. Die wichtigsten objekt-gerichteten Generalisierungsoperatoren von Punktdaten sind Selektion, Vereinfachung, Aggregation und Verdrängung. Für diese Hauptoperatoren der Generalisierung schlägt die Arbeit quadtree-basierte Generalisierungsalgorithmen vor. Für den Fall der raumgerichteten Generalisierung sind zwei Ansätze umgesetzt worden, welche das Konzept des dehnbaren Raumes (’malleable space’) verwenden. Erstens dichtebasierte Kartogramme und zweitens Laplace Glättung. Der Quadtree ist bei beiden Algorithmen ein masstabsabhängiger Dichteschätzer für die räumliche Deformation.
Chapter
OpenStreetMap data comprise of very detailed (e.g. zebra crossing) and quite rough features (e.g. built-up area). But making large scale maps from data with inconsistent level of detail often blurs map comprehension. This paper explores the automatic harmonization of OpenStreetMap data for large scale maps, i.e. the process that transforms rough objects to make them consistent with detailed objects. A typology of the new operators that harmonization requires is presented and six algorithms that implement the operators are described. Experiments with these algorithms raise several research questions about automation, parametrization, or the level of abstraction of the transformation, which are discussed in the paper.
Conference Paper
Graphical user interfaces are composed of varying elements (text, images, etc.) whose visual arrangement has been relatively well established in the context of rectangular interfaces. The advent of non-rectangular displays questions this knowledge. In this paper we study how traditional content layouts can be adapted to fit different non-rectangular displays. We performed a first qualitative study where graphic designers fitted text and images into different non-rectangular displays. From the analysis of their output we generalize and adapt ten composition principles that have been proposed in the literature for rectangular displays. We evaluate the revised principles through a paired comparison questionnaire where 57 participants compared pairs of layouts. Using the Bradley-Terry-Luce model to analyze our data we show that some results contradict current conventions on visual design for rectangular displays. We then extracted the most interesting cases and conducted a follow up study with additional shapes to investigate how the principles generalize. From these results we propose a set of guidelines for designing visual content for non-rectangular displays.
Article
Qualitative geographic information systems (GIS) has progressed in meaningful ways since early calls for a qualitative GIS in the 1990s. From participatory methods to the invention of the participatory geoweb and finally to geospatial social media sources, the amount of information available to nonquantitative GIScientists has grown tremendously. Recently, researchers have advanced qualitative GIS by taking advantage of new data sources, like Twitter, to illustrate the occurrence of various phenomena in the data set geospatially. At the same time, computer scientists in the field of natural language processing have built increasingly sophisticated methods for digesting and analyzing large text-based data sources. In this article, the authors implement one of these methods, topic modeling, and create a visualization method to illustrate the results in a visually comparative way, directly onto the map canvas. The method is a step toward making the advances in natural language processing available to all GIScientists. The article discusses the ways in which geography plays an important part in understanding the results presented from the model and visualization, including issues of place and space.
Article
S ummary Spatial point processes may be analysed at two levels. Quadrat and distance methods were designed for the sampling of a population in the field. In this paper we consider those situations in which a map of a spatial pattern has been produced at some cost and we wish to extract the maximum possible information. We review the stochastic models which have been proposed for spatial point patterns and discuss methods by which the fit of such a model can be tested. Certain models are shown to be the equilibrium distributions of spatial–temporal stochastic processes. The theory is illustrated by several case studies.
Article
Early detection of unusual events in urban areas is a priority for city management departments, which usually deploy specific complex video-based infrastructures typically monitored by human staff. However, and with the emergence and quick popularity of Location-based social networks (LBSNs), detecting abnormally high or low number of citizens in a specific area at a specific time could be done by an expert system that automatically analyzes the public geo-tagged posts. Our approach focuses exclusively on the location information linked to these posts. By applying a density-based clustering algorithm, we obtain the pulse of the city (24 hours-7 days) in a first training phase, which enables the detection of outliers (unexpected behaviors) on-the-fly in an ulterior test or monitoring phase. This solution entails that no specific infrastructure is needed since the citizens are the ones who buy, maintain, carry the mobile devices and freely disclose their location by proactively sharing posts. Besides, location analysis is lighter than video analysis and can be automatically done. Our approach was validated using a dataset of geo-tagged posts obtained from Instagram in New York City for almost six months with good results. Actually, not only all the already previously known events where detected, but also other unknown events where discovered during the experiment.
Article
The unprecedented availability of social media data offers substantial opportunities for data owners, system operators, solution providers and end users to explore and understand social dynamics. However, the exponential growth in the volume, velocity, and variability of social media data prevents people from fully utilizing such data. Visual analytics, which is an emerging research direction, has received considerable attention in recent years. Many visual analytics methods have been proposed across disciplines to understand large-scale structured and unstructured social media data. This objective, however, also poses significant challenges for researchers to obtain a comprehensive picture of the area, understand research challenges, and develop new techniques. In this paper, we present a comprehensive survey to characterize this fast-growing area and summarize the state-of-the- art techniques for analyzing social media data. In particular, we classify existing techniques into two categories: gathering information and understanding user behaviors. We aim to provide a clear overview of the research area through the established taxonomy. We then explore the design space and identify the research trends. Finally, we discuss challenges and open questions for future studies.
Article
The field of Geographical Information Systems (GIS) has experienced a rapid and ongoing growth of available sources for geospatial data. This growth has demanded more data integration in order to explore the benefits of these data further. However, many data providers implies many points of view for the same phenomena: geospatial features. We need sophisticated procedures aiming to find the correspondences between two vector datasets, a process named geospatial data matching. Similarity measures are key-tools for matching methods, so it is interesting to review these concepts together. This article provides a survey of 30 years of research into the measures and methods facing geospatial data matching. Our survey presents related work and develops a common taxonomy that permits us to compare measures and methods. This study points out relevant issues that may help to discover the potential of these approaches in many applications, like data integration, conflation, quality evaluation, and data management.
Article
Nowadays, various data collected in urban context provide unprecedented opportunities for building a smarter city through urban computing. However, due to heterogeneity, high complexity and large volumes of these urban data, analyzing them is not an easy task, which often requires integrating human perception in analytical process, triggering a broad use of visualization. In this survey, we first summarize frequently used data types in urban visual analytics, and then elaborate on existing visualization techniques for time, locations and other properties of urban data. Furthermore, we discuss how visualization can be combined with automated analytical approaches. Existing work on urban visual analytics is categorized into two classes based on different outputs of such combinations: 1) For data exploration and pattern interpretation, we describe representative visual analytics tools designed for better insights of different types of urban data. 2) For visual learning, we discuss how visualization can help in three major steps of automated analytical approaches (i.e. cohort construction; feature selection & model construction; result evaluation & tuning) for a more effective machine learning or data mining process, leading to sort of artificial intelligence, such as a classifier, a predictor or a regression model. Finally, we outlook the future of urban visual analytics, and conclude the survey with potential research directions.