[Show abstract][Hide abstract] ABSTRACT: Increasing volumes of geo-referenced data are becoming available. This data includes so-called points of interest that describe businesses, tourist attractions, etc. by means of a geo-location and properties such as a textual description or ratings. We propose and study the efficient implementation of a new kind of query on points of interest that takes into account both the locations and properties of the points of interest. The query takes a result cardinality, a spatial range, and property-related preferences as parameters, and it returns a compact set of points of interest with the given cardinality and in the given range that satisfies the preferences. Specifically, the points of interest in the result set cover so-called allying preferences and are located far from points of interest that possess so-called alienating preferences. A unified result rating function integrates the two kinds of preferences with spatial distance to achieve this functionality. We provide efficient exact algorithms for this kind of query. To enable queries on large datasets, we also provide an approximate algorithm that utilizes a nearest-neighbor property to achieve scalable performance. We develop and apply lower and upper bounds that enable search-space pruning and thus improve performance. Finally, we provide a generalization of the above query and also extend the algorithms to support the generalization. We report on an experimental evaluation of the proposed algorithms using real point of interest data from Google Places for Business that offers insight into the performance of the proposed solutions.
[Show abstract][Hide abstract] ABSTRACT: With the increasing availability of moving-object tracking data, trajectory search and matching is increasingly important. We propose and investigate a novel problem called personalized trajectory matching (PTM). In contrast to conventional trajectory similarity search by spatial distance only, PTM takes into account the significance of each sample point in a query trajectory. A PTM query takes a trajectory with user-specified weights for each sample point in the trajectory as its argument. It returns the trajectory in an argument data set with the highest similarity to the query trajectory. We believe that this type of query may bring significant benefits to users in many popular applications such as route planning, carpooling, friend recommendation, traffic analysis, urban computing, and location-based services in general. PTM query processing faces two challenges: how to prune the search space during the query processing and how to schedule multiple so-called expansion centers effectively. To address these challenges, a novel two-phase search algorithm is proposed that carefully selects a set of expansion centers from the query trajectory and exploits upper and lower bounds to prune the search space in the spatial and temporal domains. An efficiency study reveals that the algorithm explores the minimum search space in both domains. Second, a heuristic search strategy based on priority ranking is developed to schedule the multiple expansion centers, which can further prune the search space and enhance the query efficiency. The performance of the PTM query is studied in extensive experiments based on real and synthetic trajectory data sets.
[Show abstract][Hide abstract] ABSTRACT: Indoor positioning systems based on fingerprinting techniques generally require costly initialization and maintenance by trained surveyors. Organic positioning systems aim to eliminate these deficiencies by managing their own accuracy and obtaining input from users and other sources. Such systems introduce new challenges, e.g., detection and filtering of erroneous user input, estimation of the positioning accuracy, and means of obtaining user input when necessary. We envision a fully organic indoor positioning system, where all available sources of information are exploited in order to provide room-level accuracy with no active intervention of users. For example, such systems can exploit pre-installed cameras to associate a user's location with a Wi-Fi fingerprint from the user's phone; and it can use a calendar to determine whether a user is in the room reported by the positioning system. Numerous possibilities for integration exist that may provide better indoor positioning.
Proceedings of the Fifth ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness; 11/2013
[Show abstract][Hide abstract] ABSTRACT: This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corpora – for example Michaelmas or Vasant Panchami. Using Wikipedia and linked data, we automatically construct a resource of English named temporal expressions, and use it to extract training examples from a large corpus. These examples are then used to train and evaluate a named temporal expression recogniser. We also introduce and evaluate rules for automatically interpreting these expressions, and we observe that use of the rules improves temporal annotation performance over existing corpora.
Recent Advances in Natural Language Processing, RANLP 2013, Hissar, Bulgaria; 09/2013
[Show abstract][Hide abstract] ABSTRACT: The web is increasingly being used by mobile users, and it is increasingly possible to accurately geo-position mobile users. In addition, increasing volumes of geo-tagged web content are becoming available. Further, indications are that a substantial fraction of web keyword queries target local content. When combined, these observations suggest that spatial keyword querying is important and indeed gaining in importance. A prototypical spatial keyword query takes a user location and user-supplied keywords as parameters and returns web content that is spatially and textually relevant to these parameters. The paper reviews key concepts related to spatial keyword querying and reviews recent proposals by the author and his colleagues for spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently.
Proceedings of the 7th International Workshop on Ranking in Databases; 08/2013
[Show abstract][Hide abstract] ABSTRACT: The notion of point-of-interest (PoI) has existed since paper road maps began to include markings of useful places such as gas stations, hotels, and tourist attractions. With the introduction of geopositioned mobile devices such as smartphones and mapping services such as Google Maps, the retrieval of PoIs relevant to a user's intent has became a problem of automated spatio-textual information retrieval. Over the last several years, substantial research has gone into the invention of functionality and efficient implementations for retrieving nearby PoIs. However, with a couple of exceptions existing proposals retrieve results at single-PoI granularity. We assume that a mobile device user issues queries consisting of keywords and an automatically supplied geo-position, and we target the common case where the user wishes to find nearby groups of PoIs that are relevant to the keywords. Such groups are relevant to users who wish to conveniently explore several options before making a decision such as to purchase a specific product. Specifically, we demonstrate a practical proposal for finding top-k PoI groups in response to a query. We show how problem parameter settings can be mapped to options that are meaningful to users. Further, although this kind of functionality is prone to combinatorial explosion, we will demonstrate that the functionality can be supported efficiently in practical settings.
Proceedings of the VLDB Endowment. 08/2013; 6(12):1226-1229.
[Show abstract][Hide abstract] ABSTRACT: The monitoring of a system can yield a set of measurements that can be modeled as a collection of time series. These time series are often sparse, due to missing measurements, and spatiotemporally correlated, meaning that spatially close time series exhibit temporal correlation. The analysis of such time series offers insight into the underlying system and enables prediction of system behavior. While the techniques presented in the paper apply more generally, we consider the case of transportation systems and aim to predict travel cost from GPS tracking data from probe vehicles. Specifically, each road segment has an associated travel-cost time series, which is derived from GPS data. We use spatio-temporal hidden Markov models (STHMM) to model correlations among different traffic time series. We provide algorithms that are able to learn the parameters of an STHMM while contending with the sparsity, spatio-temporal correlation, and heterogeneity of the time series. Using the resulting STHMM, near future travel costs in the transportation network, e.g., travel time or greenhouse gas emissions, can be inferred, enabling a variety of routing services, e.g., eco-routing. Empirical studies with a substantial GPS data set offer insight into the design properties of the proposed framework and algorithms, demonstrating the effectiveness and efficiency of travel cost inferencing.
Proceedings of the VLDB Endowment. 07/2013; 6(9):769-780.
[Show abstract][Hide abstract] ABSTRACT: Mobile location-based services is a very successful class of services that are being used frequently by users with GPS-enabled mobile devices such as smartphones. This paper presents a study of how to exploit GPS trajectory data, which is available in increasing volumes, for the assessment of the quality of one kind of location-based service, namely routing services. Specifically, the paper presents a framework that enables the comparison of the routes provided by routing services with the actual driving behaviors of local drivers. Comparisons include route length, travel time, and also route popularity, which are enabled by common driving behaviors found in available trajectory data. The ability to evaluate the quality of routing services enables service providers to improve the quality of their services and enables users to identify the services that best serve their needs. The paper covers experiments with real vehicle trajectory data and an existing online navigation service. It is found that the availability of information about previous trips enables better prediction of route travel time and makes it possible to provide the users with more popular routes than does a conventional navigation service.
Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management - Volume 01; 06/2013
[Show abstract][Hide abstract] ABSTRACT: Reliable indoor positioning is an important foundation for emerging indoor location based services. Most existing indoor positioning proposals rely on a single wireless technology, e.g., Wi-Fi, Bluetooth, or RFID. A hybrid positioning system combines such technologies and achieves better positioning accuracy by exploiting the different capabilities of the different technologies. In a hybrid system based on Wi-Fi and Bluetooth, the former works as the main infrastructure to enable fingerprint based positioning, while the latter (via hotspot devices) partitions the indoor space as well as a large Wi-Fi radio map. As a result, the Wi-Fi based online position estimation is improved in a divide-and-conquer manner. We study three aspects of such a hybrid indoor positioning system. First, to avoid large positioning errors caused by similar reference positions that are hard to distinguish, we design a deployment algorithm that identifies and separates such positions into different smaller radio maps by deploying Bluetooth hotspots at particular positions. Second, we design methods that improve the partition switching that occurs when a user leaves the detection range of a Bluetooth hotspot. Third, we propose three architectural options for placement of the computation workload. We evaluate all proposals using both simulation and walkthrough experiments in two indoor environments of different sizes. The results show that our proposals are effective and efficient in achieving very good indoor positioning performance.
Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management - Volume 01; 06/2013
[Show abstract][Hide abstract] ABSTRACT: With the proliferation of mobile computing, positioning systems are becoming available that enable indoor location-based services. As a result, indoor tracking data is also becoming available. This paper puts focus on one use of such data, namely the identification of typical movement patterns among indoor moving objects. Specifically, the paper presents a method for the identification of movement patterns. Leveraging concepts from sequential pattern mining, the method takes into account the specifics of spatial movement and, in particular, the specifics of tracking data that captures indoor movement. For example, the paper's proposal supports spatial aggregation and utilizes the topology of indoor spaces to achieve better performance. The paper reports on empirical studies with real and synthetic data that offer insights into the functional and computational aspects of its proposal.
Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management - Volume 01; 06/2013
[Show abstract][Hide abstract] ABSTRACT: Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy. We propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the server-side computation as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. We present techniques that aim to compute the next safe zone efficiently, and we present two types of conservative safe zones that aim to reduce the communication cost. Empirical studies with real data suggest that the proposals are efficient. To understand the effectiveness of the proposed safe zones, we study analytically the expected area of a safe zone, which indicates on average for how long a safe zone remains valid, and we study the expected number of influence objects needed to define a safe zone, which gives an estimate of the average communication cost. The analytical modeling is validated through empirical studies.
ACM Transactions on Database Systems (TODS). 04/2013; 38(1).
[Show abstract][Hide abstract] ABSTRACT: A wide variety of desktop and mobile Web applications involve geo-tagged content, e.g., photos and (micro-) blog postings. Such content, often called User Generated Geo-Content (UGGC), plays an increasingly important role in many applications. However, a great demand also exists for "core" UGGC where the geo-spatial aspect is not just a tag on other content, but is the primary content, e.g., a city street map with up-to-date road construction data. Along these lines, the iPark system aims to turn volumes of GPS data obtained from vehicles into information about the locations of parking spaces, thus enabling effective parking search applications. In particular, we demonstrate how iPark helps ordinary users annotate an existing digital map with two types of parking, on-street parking and parking zones, based on vehicular tracking data.
Proceedings of the 16th International Conference on Extending Database Technology; 03/2013
[Show abstract][Hide abstract] ABSTRACT: Finding a location for a new facility such that the facility attracts the
maximal number of customers is a challenging problem. Existing studies either
model customers as static sites and thus do not consider customer movement, or
they focus on theoretical aspects and do not provide solutions that are shown
empirically to be scalable. Given a road network, a set of existing facilities,
and a collection of customer route traversals, an optimal segment query returns
the optimal road network segment(s) for a new facility. We propose a practical
framework for computing this query, where each route traversal is assigned a
score that is distributed among the road segments covered by the route
according to a score distribution model. The query returns the road segment(s)
with the highest score. To achieve low latency, it is essential to prune the
very large search space. We propose two algorithms that adopt different
approaches to computing the query. Algorithm AUG uses graph augmentation, and
ITE uses iterative road-network partitioning. Empirical studies with real data
sets demonstrate that the algorithms are capable of offering high performance
in realistic settings.
[Show abstract][Hide abstract] ABSTRACT: With the increasing availability of terrain data, e.g., from aerial laser scans, the management of such data is attracting increasing at- tention in both industry and academia. In particular, spatial queries, e.g., k-nearest neighbor and reverse nearest neighbor queries, in Euclidean and spatial network spaces are being extended to ter- rains. Such queries all rely on an important operation, that of finding shortest surface distances. However, shortest surface dis- tance computation is very time consuming. We propose techniques that enable efficient computation of lower and upper bounds of the shortest surface distance, which enable faster query processing by eliminating expensive distance computations. Empirical studies show that our bounds are much tighter than the best-known bounds in many cases and that they enable speedups of up to 43 times for some well-known spatial queries.
[Show abstract][Hide abstract] ABSTRACT: Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers.
The VLDB Journal 02/2013; 22(1). · 1.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective. Index Terms—Spatial databases and GIS, Correlation and regression analysis. !
IEEE Transactions on Knowledge and Data Engineering 01/2013; · 1.89 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 state-of-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research.
Proceedings of the 39th international conference on Very Large Data Bases; 01/2013
[Show abstract][Hide abstract] ABSTRACT: The use of accurate 3D spatial network models can enable substantial improvements in vehicle routing. Notably, such models enable eco-routing, which reduces the environmental impact of transportation. We propose a novel filtering and lifting framework that augments a standard 2D spatial network model with elevation information extracted from massive aerial laser scan data and thus yields an accurate 3D model. We present a filtering technique that is capable of pruning irrelevant laser scan points in a single pass, but assumes that the 2D network fits in internal memory and that the points are appropriately sorted. We also provide an external-memory filtering technique that makes no such assumptions. During lifting, a triangulated irregular network (TIN) surface is constructed from the remaining points. The 2D network is projected onto the TIN, and a 3D network is constructed by means of interpolation. We report on a large-scale empirical study that offers insight into the accuracy, efficiency, and scalability properties of the framework.