Xiaoyang Sean WangFudan University · School of Computer Science
Xiaoyang Sean Wang
PHD
About
243
Publications
15,555
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,682
Citations
Introduction
Additional affiliations
September 2003 - August 2011
Publications
Publications (243)
Microphones attracted a lot of attentions from attackers due to the sensitivity of voice data: attackers may control devices through abusing their microphones, fingerprint devices by measuring their microphones, or directly monitor the microphone readings to steal users’ private data. Nevertheless, OS developers failed to address the severe consequ...
At the same time the European Union is implementing new strict data
protection regulations, China’s data trading and sharing markets are booming. Here, we survey the status of these developing markets driven by growing demand from artificial intelligence (AI)-related industries, covering government encouragement as well as critical concerns and res...
Sensors are widely used in modern mobile devices (e.g., smartphones, watches) and may gather information, including photos, sounds and locations, from environments as well as about users. However, the powerful sensing abilities provide opportunities for attackers to steal both personal sensitive data and commercial secrets. Unfortunately, the curre...
With the rapid development of mobile applications and online social networks, users often encounter a frustrating challenge to set privacy and security policies (i.e., permission requests) of various applications correctly. For instance, in an Android system, it is hard for users, even programmers, to identify malicious permission requests (policie...
Data security is one of the leading concerns and primary challenges for cloud computing. This issue is getting more and more serious with the development of cloud computing. However, the existing privacy-preserving data sharing techniques either fail to prevent the leakage of privacy or incur huge amounts of information loss. In this paper, we prop...
Dynamic revocation of permissions of installed Android applications has been gaining popularity, because of the increasing concern of security and privacy in the Android platform. However, applications often crash or misbehave when their permissions are revoked, rendering applications completely unusable. Even though Google has officially introduce...
In recent years, MapReduce has become a popular computing framework for big data analysis. Join is a major query type for data analysis and various algorithms have been designed to process join queries on top of Hadoop. Since the efficiency of different algorithms differs on the join tasks on hand, to achieve a good performance, users need to selec...
Recently, Hadoop has become a common programming framework for big data analysis on a cluster of commodity machines. To optimize queries on a large amount of data managed by the Hadoop Distributed File System (HDFS), it is particularly important to optimize the reading of the data. Previous works either designed file formats to cluster data belongi...
A well-studied query type on moving objects is the continuous range query. An interesting and practical situation is that instead of being continuously evaluated, the query may be evaluated at different degrees of continuity, e.g., every 2 seconds (close to continuous), every 10 minutes or at irregular time intervals (close to snapshot). Furthermor...
Android platform adopts permissions to protect sensitive resources from untrusted apps. However, after permissions are granted by users at install time, apps could use these permissions (sensitive resources) with no further restrictions. Thus, recent years have witnessed the explosion of undesirable behaviors in Android apps. An important part in t...
Android phones often carry personal information, attracting malicious developers to embed code in Android applications to steal sensitive data. With known techniques in the literature, one may easily determine if sensitive data is being transmitted out of an Android phone. However, transmission of sensitive data in itself does not necessarily indic...
In many application domains, occurrences of related spatial features may exhibit co-location pattern. For example, some disease may be in spatial proximity of certain type of pollution. This paper studies the problem of regional co-locations with arbitrary shapes. Regional co-locations represent regions in which two spatial features exhibit stronge...
Data is increasingly available in a digital form, and data about us is being continuously collected. Such data has made possible many interesting and useful applications, and in essense made it possible for the Web to exist in the current form. Sharing this data makes a lot of sense for many reasons. However, personal privacy has become a concern....
Algorithms and concepts for maintaining uniform random samples of streaming data and stream joins. These algorithms and concepts are used in systems and methods, such as wireless sensor networks and methods for implementing such networks, that generate and handle such streaming data and/or stream joins. The algorithms and concepts directed to strea...
The mean occupancy rates of personal vehicle trips in the United States is
only 1.6 persons per vehicle mile. Urban traffic gridlock is a familiar scene.
Ridesharing has the potential to solve many environmental, congestion, and
energy problems. In this paper, we introduce the problem of large scale
real-time ridesharing with service guarantee on r...
The term "indoor" here refers generally to enclosed space partitioned into subspaces with connecting doors or gates. Examples include the inside of office buildings, amusement parks, and indoor shopping malls. In many applications, it is desirable to keep track of the distribution of people within the enclosed space. These applications range from s...
Decision optimization is widely used in many decision guidance and support systems (DGSS) to support business decisions such as procurement, scheduling, and planning. In spite of rapid changes in users' requirements, the implementation of DGSS is typically rigid, expensive, and not easily extensible, which is in stark contrast to the agile implemen...
In cognitive sensor networks, achieving consensus among the sensor nodes without requiring centralized control is an important attribute that can enable quick and reliable network decisions. Decentralized consensus building can be achieved through iterative information exchange among sensor nodes. While much of the literature has concentrated on de...
This talk gives a personal perspective on the topic area of this new conference on data and application security and privacy, the difficult nature of the challenge we are confronting and possible research thrusts that may help us progress to an effective ...
Decision optimization is used in many applications such as those for finding the best course of action in emergencies. However, optimization solutions require considerable mathematical expertise and effort to generate effective models. On the other hand, reporting applications over databases are more intuitive and have long been established using t...
A growing number of applications and services include among their pa-rameters a spatial and temporal characterization of some of the involved entities. These applications are enabled by positioning technologies that can provide real-time data about the position of moving objects. Depend-ing on the considered attack model, location data management f...
A major feature of the emerging geo-social networks is the ability to notify a user when any of his friends (also called buddies) happens to be geographi-cally in proximity. This proximity service is usually of-fered by the network itself or by a third party service provider (SP) using location data acquired from the users. This paper provides a ri...
A major feature of the emerging geo-social networks is the ability to notify a user when one of his friends (also called buddies) happens to be geographically in proximity with the user. This proximity service is usually offered by the network itself or by a third party service provider (SP) using location data acquired from the users. This paper p...
Spatial cloaking has been proposed and studied to protect mobile user privacy when using location based services (LBS). Traditional spatial cloaking methods are carried out by a trusted proxy known as location trusted server (LTS) to generate a region that contains at least k users for every request. The LTS is assumed to know the location of all u...
Location information is necessarily uncertain when objects are constantly moving. The cost can be high to maintain precise locations at the application server for all the objects while many applications may not need all the costly precision that is technically possible. An interesting question is how to reduce the cost associated with obtaining pre...
In very large-scale dense sensor network applications, more sensor nodes may be deployed than are required to pro- vide the initial desired spatial resolution. Such over-deployment can extend network life, improve robustness and accommodate network dynamic. To enable large deployments, tiered and clustered network structures may be adopted for scal...
The demonstrated, high-level decisions query language DQL combines the decision optimization capability of mathematical programming and the data manipulation capability of traditional database query languages. DQL benefits application developers in two aspects. First, it avoids a conceptual impedance mismatch between mathematical programming and da...
Proximity based services are location based services (LBS) in which the service adaptation depends on the comparison between a given threshold value and the distance between a user and other (possibly moving) entities. While privacy preservation in LBS has lately received much attention, very limited work has been done on privacy-aware proximity ba...
A privacy violation occurs when the association between an individ- ual identity and data considered private by that individual is obtained by an unauthorized party. Uncertainty and indistinguishability are two independent aspects that characterize the degree of this association be- ing revealed. Indistinguishability refers to the property that the...
One of the privacy threats recognized in the use of LBS is represented by an adversary having information about the presence of individuals in certain locations, and using this information together with an (anonymous) LBS request to re-identify the issuer of the request associating her to the requested service. Several papers have proposed techniqu...
Protecting privacy of mobile users of location-based services is a currently interesting research problem. Most protection techniques can be categorized into either those providing location privacy or those guaranteeing k-anonymity. A mobile user (i) has location privacy if, when he makes an LBS request, adversaries cannot tell his location precise...
The problem of protecting user’s privacy in Location-Based Services (LBS) has been extensively studied recently and several defense techniques have been proposed. In this contribution, we first present a categorization of privacy attacks and related defenses. Then, we consider the class of defense techniques that aim at providing privacy through an...
The concept of quasi-ID (QI) is fundamental to the notion of k-anonymity that has gained popularity recently as a privacy-preserving method in microdata publication. This paper shows that
it is important to provide QI with a formal underpinning, which, surprisingly, has been generally absent in the literature.
The study presented in this paper prov...
Trust management systems are frameworks for authorization in modern distributed systems, allowing remotely accessible resources to be protected by providers. By allowing providers to specify policy, and access requesters to possess certain access rights, trust management automates the process of determining whether access should be allowed on the b...
We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can
only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require
as less communication cost...
In the information age and with explosively increasing volumes of remote sensing, model and other Earth Science data available, scientists are now facing challenges to find and to access interesting data sets effectively and efficiently through the Internet. In this paper, we first discuss the DIstributed MEtadata Server (DIMES) prototype system. D...
Abstract Significant research has been devoted to aggregation in sensor networks with a view to optimize its performance. Existing research has mostly concentrated on maximizing,network lifetime within a user-given error bound. In general, the greater the error bound, the longer the lifetime. However, in some situations, it may not be realistic for...
Recently, periodic pattern mining from time series data has been studied extensively. However, an interesting type of periodic
pattern, called partial periodic (PP) correlation in this paper, has not been investigated. An example of PP correlation is
that power consumption is high either on Monday or Tuesday but not on both days. In general, a PP c...
Many state-of-the-art selectivity estimation methods use query feedback to maintain histogram buckets, thereby using the limited memory efficiently. However, they are “reactive” in nature, that is, they update the histogram based on queries that have come to the system in the past for evaluation. In some applications, future occurrences of certain...
A decision guidance management system (DGMS) is a productivity platform for fast development of applications that require a closed-loop data acquisition, learning, prediction, and decision optimization. This paper introduces the DGMS concept, and the first DGMS data model with its query language, DG-SQL. The DGMS data model is an extension of the r...
The evaluation of privacy-preserving techniques for LBS is often based on simulations of mostly random user movements that only partially capture real deployment scenarios. We claim that benchmarks tailored to specific scenarios are needed, and we report preliminary results on how they may be generated through an agent-based context-aware simulator...
We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost...
Copyright c○2008 for the individual papers by the papers ’ authors. Copying permitted for private and academic purposes. Re-publication of material from this volume requires permission by the copyright owners. Preface Location based applications in travel, logistics, health care, social networks and other industries already exist and are poised to...
This book constitutes the proceedings of the 9th International Conference on Web Information Systems Engineering, WISE 2008, held in Auckland, New Zealand, in September 2008.
The 17 revised full papers and 14 revised short papers presented together with two keynote talks were carefully reviewed and selected from around 110 submissions. The papers a...
The adoption of location-based services (LBS) brings new privacy threats to users. The user location information revealed
in LBS requests may be used by attackers to associate sensitive information of the user with her identity. This contribution
focuses on privacy protection through anonymity, i.e., keeping individual users indistinguishable in a...
Spatial generalisation has been recently proposed as a technique for the anonymisation of requests in location based services. This article provides a formal characterisation of a privacy attack that has been informally described in previous work, and presents a new generalisation algorithm that is proved to be a safe defense against that attack. T...
In stream join processing with limited memory, uniform random sampling is useful for approximate query evaluation. In this paper, we address the problem of reservoir sampling over memory-limited stream joins. We present two sampling algorithms, reservoir join-sampling (RJS) and progressive reservoir join-sampling (PRJS). RJS is designed straightfor...
Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. There are situations, however, in which it is necessary and/or advantageous to adaptively adjust the size of a reservoir in the middle of sampling due to changes in data characteristics and/o...
Nowadays, wireless sensor networks have been widely used in many monitoring applications. Due to the low quality of sensors and random effects of the environments, however, it is well known that the collected sensor data are noisy. Therefore, it is very critical to clean the sensor data before using them to answer queries or conduct data analysis....
Distributed authorization takes into account several elements, includ- ing certiflcates that may be provided by non-local actors. While most trust management systems treat all assertions as equally valid up to certiflcate authentication, realistic considerations may associate risk with some of these elements, for example some actors may be less tru...
A general consensus is that the proliferation of location- aware devices will result in a diffusion of location-based services. Privacy preservation is a challenging research issue for this kind of service. A possible solution consists of ensuring users' anonymity, i.e., ensuring that the user issuing a request is indistinguishable, among a group o...
In the recent years several research eorts have focused on the concept of time granu- larity and its applications. A first stream of research investigated the mathematical models behind the notion of granularity and the algorithms to manage temporal data based on those models. A second stream of research investigated symbolic formalisms providing a...
The processing of k-NN queries has been studied extensively both in a centralized computing environment and in a structured P2P environment. However, the problem over an unstructured P2P system is not well studied despite of their popularity. Communication-efficient processing of k-NN queries in such an environment is a unique challenge due to the...
In many data systems, it is important to protect individual privacy while satisfying application requirements. To provide such protection, privacy disclosure must be measured in some quantitative manner, as absolute privacy is usually not a practical proposition. Privacy measurement metrics have appeared in the literature, but they are either for s...
Recent advances in wireless, sensing, and embedded systems technologies are poised to enable the use of thousands of low-cost nodes to form a wireless sensor network. These networks promise monitoring of environments that are remote or inaccessible, providing users with critical information with unprecedented spatial and temporal resolution. One ma...
The concept of k-anonymity, used in the literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter be...
Key predistribution in wireless sensor networks refers to the problem of distributing secret keys among sensors prior to deployment. Solutions appeared in the literature can be classified into two categories: basic schemes that achieve fixed probability of sharing a key between any pair of sensors in a network and location-aware schemes that use a...
Uncertainty and indistinguishability are two independent as- pects of privacy. Uncertainty refers to the property that the attacker can- not tell which private value, among a group of values, an individual actu- ally has, and indistinguishability refers to the property that the attacker cannot see the difierence among a group of individuals. While...
Given d input time series, an aggregated series can be formed by aggregating the d values at each time position. It is often useful to find the time positions whose aggregated values are the greatest. Instead of looking for individual top-k time positions, this paper gives two al- gorithms for finding the time interval (called the plateau) in which...
A significant property of temporal data is their richness of semantics. Although several temporal data models and query languages
have been designed specifically to handle the temporal data, users must still deal with much of the implicit temporal information,
which can be automatically derived from the stored data in certain situations. We propose...
Advances in sensor technology and image processing have made it possible to equip unmanned aerial vehicles (UAVs) with economical, high-resolution, energy-efficient sensors. Despite the improvements, current UAVs lack autonomous and collaborative operation capabilities, due to limited bandwidth and limited on-board image processing abilities. The s...
This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent...
In this paper we extend the notion of k-anonymity in the context of databases with timestamped information in order to naturally define k-anonymous views of temporal data. We also investigate the problem of obtaining these views. We show that known generalization techniques, despite being applicable under certain conditions, have some limitations,...
The processing of k-NN queries has been studied extensively both in a centralized computing environment and in a structured P2P environment. However, the problem over an unstructured P2P system is not well studied despite of their popularity. Communication-efficient processing of k-NN queries in such an environment is a unique challenge due to the...
A stream classifier is a decision model that assigns a class label to a data stream, based on its arriving data. Various features of the stream can be used in the classifier, each of which may have different relevance to the classification task and different cost in obtaining its value. As time passes by, some less costly features may become more r...