[show abstract][hide abstract] ABSTRACT: This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.
IEEE Transactions on Knowledge and Data Engineering 03/2012; · 1.89 Impact Factor
[show abstract][hide abstract] ABSTRACT: Indoor spaces accommodate large numbers of spatial objects, e.g., points of interest (POIs), and moving populations. A variety of services, e.g., location-based services and security control, are relevant to indoor spaces. Such services can be improved substantially if they are capable of utilizing indoor distances. However, existing indoor space models do not account well for indoor distances. To address this shortcoming, we propose a data management infrastructure that captures indoor distance and facilitates distance-aware query processing. In particular, we propose a distance-aware indoor space model that integrates indoor distance seamlessly. To enable the use of the model as a foundation for query processing, we develop accompanying, efficient algorithms that compute indoor distances for different indoor entities like doors as well as locations. We also propose an indexing framework that accommodates indoor distances that are pre-computed using the proposed algorithms. On top of this foundation, we develop efficient algorithms for typical indoor, distance-aware queries. The results of an extensive experimental evaluation demonstrate the efficacy of the proposals.
[show abstract][hide abstract] ABSTRACT: Community Question Answering (CQA) is a popular type of service where users ask questions and where answers are obtained from other users or from historical question-answer pairs. CQA archives contain large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. This article presents several new approaches to exploiting the category information of questions for improving the performance of question retrieval, and it applies these approaches to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are effective and efficient and are capable of outperforming a variety of baseline methods significantly.
ACM Transactions on Information Systems - TOIS. 01/2012;
[show abstract][hide abstract] ABSTRACT: The skyline of a multidimensional point set consists of the points that are not dominated by other points. In a scenario where product features are represented by multidimensional points, the skyline points may be viewed as representing competitive products. A product provider may wish to upgrade uncompetitive products to become competitive, but wants to take into account the upgrading cost. We study the top-k product upgrading problem. Given a set P of competitor products, a set T of products that are candidates for upgrade, and an upgrading cost function f that applies to T, the problem is to return the k products in T that can be upgraded to not be dominated by any products in P at the lowest cost. This problem is non-trivial due to not only the large data set sizes, but also to the many possibilities for upgrading a product. We identify and provide solutions for the different options for upgrading an uncompetitive product, and combine the solutions into a single solution. We also propose a spatial join-based solution that assumes P and T are indexed by an R-tree. Given a set of products in the same R-tree node, we derive three lower bounds on their upgrading costs. These bounds are employed by the join approach to prune upgrade candidates with uncompetitive upgrade costs. Empirical studies with synthetic and real data show that the join approach is efficient and scalable.
[show abstract][hide abstract] ABSTRACT: Web search is ubiquitous in our daily lives. Caching has been extensively used to reduce the computation time of the search engine and reduce the network traffic beyond a proxy server. Another form of web search, known as online shortest path search, is popular due to advances in geo-positioning. However, existing caching techniques are ineffective for shortest path queries. This is due to several crucial differences between web search results and shortest path results, in relation to query matching, cache item overlapping, and query cost variation. Motivated by this, we identify several properties that are essential to the success of effective caching for shortest path search. Our cache exploits the optimal subpath property, which allows a cached shortest path to answer any query with source and target nodes on the path. We utilize statistics from query logs to estimate the benefit of caching a specific shortest path, and we employ a greedy algorithm for placing beneficial paths in the cache. Also, we design a compact cache structure that supports efficient query matching at runtime. Empirical results on real datasets confirm the effectiveness of our proposed techniques.
[show abstract][hide abstract] ABSTRACT: Modern processors consist of multiple cores that each support parallel processing by multiple physical threads, and they offer
ample main-memory storage. This paper studies the use of such processors for the processing of update-intensive moving-object
workloads that contain very frequent updates as well as contain queries.
The non-trivial challenge addressed is that of avoiding contention between long-running queries and frequent updates. Specifically,
the paper proposes a grid-based indexing technique. A static grid indexes a near up-to-date snapshot of the data to support
queries, while a live grid supports updates. An efficient cloning technique that exploits the memcpy system call is used to maintain the static grid.
An empirical study conducted with three modern processors finds that very frequent cloning, on the order of tens of milliseconds,
is feasible, that the proposal scales linearly with the number of hardware threads, and that it significantly outperforms
the previous state-of-the-art approach in terms of update throughput and query freshness.
[show abstract][hide abstract] ABSTRACT: Location-Based Services (LBSs) constitutes one of the most popular classes of mobile services. However, while current LBSs typically target outdoor settings, we lead large parts of our lives indoors. The availability of easy-to-use and low-cost indoor positioning services is essential in also enabling indoor LBSs. Existing indoor positioning services typically use a single technology such as Wi-Fi, RFID or Bluetooth. Wi-Fi based indoor positioning is relatively easy to deploy, but does often not offer good positioning accuracy. In contrast, the use of RFID or Bluetooth for positioning requires considerable investments in equipment in order to ensure good positioning accuracy. Motivated by these observations, we propose a hybrid approach to indoor positioning. In particular, we introduce Bluetooth hotspots into an indoor space with an existing Wi-Fi infrastructure such that better positioning is achieved than what can be achieved by each technology in isolation. We design a flexible and extensible system architecture with an effective online position estimation algorithm for the hybrid system. The system is evaluated empirically in the building of our department. The results show that the hybrid approach improves positioning accuracy markedly.
Mobile Data Management (MDM), 2011 12th IEEE International Conference on; 07/2011
[show abstract][hide abstract] ABSTRACT: An online Route Planning Service (RPS) computes a route from one location to another. Current RPSs such as Google Maps require the use of precise locations. However, some users may not want to disclose their source and destination locations due to privacy concerns. An approach that supplies fake locations to an existing service incurs a substantial loss of quality of service, and the service may well return a result that may be not helpful to the user. We propose a solution that is able to return accurate route planning results when source and destination regions are used in order to achieve privacy. The solution re-uses a standard online RPS rather than replicate this functionality, and it needs no trusted third party. The solution is able to compute the exact results without leaking of the exact locations to the RPS or un-trusted parties. In addition, we provide heuristics that reduce the number of times that the RPS needs to be queried, and we also describe how the accuracy and privacy requirements can be relaxed to achieve better performance. An empirical study offers insight into key properties of the approach.
Mobile Data Management (MDM), 2011 12th IEEE International Conference on; 07/2011
[show abstract][hide abstract] ABSTRACT: Geo-social networks (GeoSNs) provide context-aware services that help associate location with users and content. The proliferation of GeoSNs indicates that they're rapidly attracting users. GeoSNs currently offer different types of services, including photo sharing, friend tracking, and "check-ins." However, this ability to reveal users' locations causes new privacy threats, which in turn call for new privacy-protection methods. The authors study four privacy aspects central to these social networks - location, absence, co-location, and identity privacy - and describe possible means of protecting privacy in these circumstances.
IEEE Internet Computing 07/2011; · 2.04 Impact Factor
[show abstract][hide abstract] ABSTRACT: To facilitate a variety of applications, positioning systems are deployed in indoor settings. For example, Bluetooth and RFID positioning are deployed in airports to support real-time monitoring of delays as well as off-line flow and space usage analyses. Such deployments generate large collections of tracking data. Like in other data management applications, joins are indispensable in this setting. However, joins on indoor tracking data call for novel techniques that take into account the limited capabilities of the positioning systems as well as the specifics of indoor spaces. This paper proposes and studies probabilistic, spatio-temporal joins on historical indoor tracking data. Two meaningful types of join are defined. They return object pairs that satisfy spatial join predicates either at a time point or during a time interval. The predicates considered include “same X,” where X is a semantic region such as a room or hallway. Based on an analysis on the uncertainty inherent to indoor tracking data, effective join probabilities are formalized and evaluated for object pairs. Efficient two-phase hash-based algorithms are proposed for the point and interval joins. In a filter-and-refine framework, an R-tree variant is proposed that facilitates the retrieval of join candidates, and pruning rules are supplied that eliminate candidate pairs that do not qualify. An empirical study on both synthetic and real data shows that the proposed techniques are efficient and scalable.
Data Engineering (ICDE), 2011 IEEE 27th International Conference on; 05/2011
[show abstract][hide abstract] ABSTRACT: Equal access to public information and services for all is an essential part of the United Nations (UN) Declaration of Human Rights. Today, the Web plays an important role in providing information and services to citizens. Unfortunately, many government Web sites are poorly designed and have accessibility barriers that prevent people with disabilities from using them. This article combines current Web accessibility benchmarking methodologies with a sound strategy for comparing Web accessibility among countries and continents. Furthermore, the article presents the first global analysis of the Web accessibility of 192 United Nation Member States made publically available. The article also identifies common properties of Member States that have accessible and inaccessible Web sites and shows that implementing antidisability discrimination laws is highly beneficial for the accessibility of Web sites, while signing the UN Rights and Dignity of Persons with Disabilities has had no such effect yet. The article demonstrates that, despite the commonly held assumption to the contrary, mature, high-quality Web sites are more accessible than lower quality ones. Moreover, Web accessibility conformance claims by Web site owners are generally exaggerated.
Journal of Information Technology & Politics 01/2011; 8(1):41-67.
[show abstract][hide abstract] ABSTRACT: Web users and content are increasingly being geo- positioned. This development gives prominence to spatial key- word queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top- k spatial keyword (MkSK) queries over spatial keyword data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within a zone. However, existing safe zone methods focus solely on spatial locations and ignore text relevancy. We propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the computation on the server as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. Empirical studies with real data suggest that our proposals are efficient.
Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany; 01/2011
[show abstract][hide abstract] ABSTRACT: With the increasing deployment of location-based services, geographic information systems, and ubiquitous computing, technologies and services that target indoor spaces are receiving increasing attention. This development is quite understandable because, as a paper presented at ISA 2010 points out, studies show that we lead most of our lives, 87% to be specific, in indoor settings. Those 87% are the focus of ISA 2010.
[show abstract][hide abstract] ABSTRACT: With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.
Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011; 01/2011
[show abstract][hide abstract] ABSTRACT: Given a set of multidimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where ks . Based on these observations, the paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set such that an order exists among the partitions. Then, set-wide maximization techniques may be applied within each partition. Efficient algorithms are developed for skyline ordering and for resolving size constraints using the skyline order. The results of extensive experiments show that skyline ordering yields a flexible framework for the efficient and scalable resolution of arbitrary size constraints on skyline queries. Index Terms—Skyline queries, query processing, database management.
IEEE Transactions on Knowledge and Data Engineering 01/2011; 23:991-1005. · 1.89 Impact Factor
[show abstract][hide abstract] ABSTRACT: Users of mobile services wish to retrieve nearby points of interest without disclosing their locations to the services. This article addresses the challenge of optimizing the query performance while satisfying given location privacy and query accuracy requirements. The article's proposal, SpaceTwist, aims to offer location privacy for k nearest neighbor (kNN) queries at low communication cost without requiring a trusted anonymizer. The solution can be used with a conventional DBMS as well as with a server optimized for location-based services. In particular, we believe that this is the first solution that expresses the server-side functionality in a single SQL statement. In its basic form, SpaceTwist utilizes well-known incremental NN query processing on the server. When augmented with a server-side granular search technique, SpaceTwist is capable of exploiting relaxed query accuracy guarantees for obtaining better performance. We extend SpaceTwist with so-called ring ranking, which improves the communication cost, delayed termination, which improves the privacy afforded the user, and the ability to function in spatial networks in addition to Euclidean space. We report on analytical and empirical studies that offer insight into the properties of SpaceTwist and suggest that our proposal is indeed capable of offering privacy with very good performance in realistic settings.
[show abstract][hide abstract] ABSTRACT: In step with the rapidly growing volumes of available moving-object trajectory data, there is also an increasing need for
techniques that enable the analysis of trajectories. Such functionality may benefit a range of application area and services,
including transportation, the sciences, sports, and prediction-based and social services, to name but a few. The chapter first
provides an overview trajectory patterns and a categorization of trajectory patterns from the literature. Next, it examines
relative motion patterns, which serve as fundamental background for the chapter's subsequent discussions. Relative patterns
enable the specification of patterns to be identified in the data that refer to the relationships of motion attributes among
moving objects. The chapter then studies disc-based and density-based patterns, which address some of the limitations of relative
motion patterns. The chapter also reviews indexing structures and algorithms for trajectory pattern mining.