Stéphane Bressan

National University of Singapore, Tumasik, Singapore

Are you Stéphane Bressan?

Claim your profile

Publications (112)7.96 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The pervasiveness of location-acquisition technologies has made it possible to collect the movement data of individuals or vehicles. However, it has to be carefully managed to ensure that there is no privacy breach. In this paper, we investigate the problem of publishing trajectory data under the differential privacy model. A straightforward solution is to add noise to a trajectory - this can be done either by adding noise to each coordinate of the position, to each position of the trajectory, or to the whole trajectory. However, such naive approaches result in trajectories with zigzag shapes and many crossings, making the published trajectories of little practical use. We introduce a mechanism called SDD (Sampling Distance and Direction), which is ε-differentially private. SDD samples a suitable direction and distance at each position to publish the next possible position. Numerical experiments conducted on real ship trajectories demonstrate that our proposed mechanism can deliver ship trajectories that are of good practical utility.
    Proceedings of the 25th International Conference on Scientific and Statistical Database Management; 07/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Efficient spatial joins are pivotal for many applications and particularly important for geographical information systems or for the simulation sciences where scientists work with spatial models. Past research has primarily focused on disk-based spatial joins; efficient in-memory approaches, however, are important for two reasons: a) main memory has grown so large that many datasets fit in it and b) the in-memory join is a very time-consuming part of all disk-based spatial joins. In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low. Our results show that TOUCH outperforms known in-memory spatial-join algorithms as well as in-memory implementations of disk-based join approaches. In particular, it has a one order of magnitude advantage over the memory-demanding state of the art in terms of number of comparisons (i.e., pairwise object comparisons), as well as execution time, while it is two orders of magnitude faster when compared to approaches with a similar memory footprint. Furthermore, TOUCH is more scalable than competing approaches as data density grows.
    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The popularity of similarity search expanded with the increased interest in multimedia databases, bioinformatics, or social networks, and with the growing number of users trying to find information in huge collections of unstructured data. During the ...
    ACM SIGMOD Record 06/2013; 42(2):46-51. · 0.46 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The widespread usage of random graphs has been highlighted in the context of database applications for several years. This because such data structures turn out to be very useful in a large family of database applications ranging from simulation to sampling, from analysis of complex networks to study of randomized algorithms, and so forth. Amongst others, Erdős–Rényi Γv,pΓv,p is the most popular model to obtain and manipulate random graphs. Unfortunately, it has been demonstrated that classical algorithms for generating Erdős–Rényi based random graphs do not scale well in large instances and, in addition to this, fail to make use of the parallel processing capabilities of modern hardware. Inspired by this main motivation, in this paper we propose and experimentally assess a novel parallel algorithm for generating random graphs under the Erdős–Rényi model that is designed and implemented in a Graphics Processing Unit (GPU), called PPreZER. We demonstrate the nice amenities due to our solution via a succession of several intermediary algorithms, both sequential and parallel, which show the limitations of classical approaches and the benefits due to the PPreZER algorithm. Finally, our comprehensive experimental assessment and analysis brings to light a relevant average speedup gain of PPreZER over baseline algorithms.
    Journal of Parallel and Distributed Computing 03/2013; 73(3):303–316. · 1.12 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The proliferation of online social networks has created intense interest in studying their nature and revealing information of interest to the end user. At the same time, such revelation raises privacy concerns. Existing research addresses this problem following an approach popular in the database community: a model of data privacy is defined, and the data is rendered in a form that satisfies the constraints of that model while aiming to maximize some utility measure. Still, these is no consensus on a clear and quantifiable utility measure over graph data. In this paper, we take a different approach: we define a utility guarantee, in terms of certain graph properties being preserved, that should be respected when releasing data, while otherwise distorting the graph to an extend desired for the sake of confidentiality. We propose a form of data release which builds on current practice in social network platforms: A user may want to see a subgraph of the network graph, in which that user as well as connections and affiliates participate. Such a snapshot should not allow malicious users to gain private information, yet provide useful information for benevolent users. We propose a mechanism to prepare data for user view under this setting. In an experimental study with real data, we demonstrate that our method preserves several properties of interest more successfully than methods that randomly distort the graph to an equal extent, while withstanding structural attacks proposed in the literature.
    Proceedings of the 21st ACM international conference on Information and knowledge management; 10/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Shipping plays a vital role as trade facilitator in providing cost-efficient transportation. The International Maritime Organisation (IMO) reports that over 90% of the world trade volume is carried by merchant ships. The analysis of shipping networks therefore can create invaluable insight into global trade. In this paper we study the appropriateness of various graph centrality measures to rate, compare and rank ports from various perspectives of global shipping networks. In particular, we illustrate the potential of such analysis on the example of shipping networks constructed from the schedules, readily available on the World Wide Web, of six shipping companies that transport 35-40% of the total volume traded (in TEUs) worldwide.
    International Journal of Business Intelligence and Data Mining 10/2012; 7(3):186-202.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The proliferation of data in graph form calls for the development of scalable graph algorithms that exploit parallel processing environments. One such problem is the computation of a graph's minimum spanning forest (MSF). Past research has proposed several parallel algorithms for this problem, yet none of them scales to large, high-density graphs. In this paper we propose a novel, scalable, parallel MSF algorithm for undirected weighted graphs. Our algorithm leverages Prim's algorithm in a parallel fashion, concurrently expanding several subsets of the computed MSF. Our effort focuses on minimizing the communication among different processors without constraining the local growth of a processor's computed subtree. In effect, we achieve a scalability that previous approaches lacked. We implement our algorithm in CUDA, running on a GPU and study its performance using real and synthetic, sparse as well as dense, structured and unstructured graph data. Our experimental study demonstrates that our algorithm outperforms the previous state-of-the-art GPU-based MSF algorithm, while being several orders of magnitude faster than sequential CPU-based algorithms.
    Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012; 09/2012
  • Xuesong Lu, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: A recurrent challenge for modern applications is the processing of large graphs. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis and other data mining tasks. For such purposes adequate sampling techniques must be devised. We are interested, in this paper, in the uniform random sampling of a connected subgraph from a graph. We require that the sample contains a prescribed number of vertices. The sampled graph is the corresponding induced graph. We devise, present and discuss several algorithms that leverage three different techniques: Rejection Sampling, Random Walk and Markov Chain Monte Carlo. We empirically evaluate and compare the performance of the algorithms. We show that they are effective and efficient but that there is a trade-off, which depends on the density of the graphs and the sample size. We propose one novel algorithm, which we call Neighbour Reservoir Sampling (NRS), that very successfully realizes the trade-off between effectiveness and efficiency.
    Proceedings of the 24th international conference on Scientific and Statistical Database Management; 06/2012
  • Baljeet Malhotra, John A. Gamon, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: Field spectrometry is emerging as an important tool in the study of the dynamics of the biosphere and atmosphere. Large amounts of data are now collected from spectrometers mounted on towers, robotic trams and other platforms. These data are crucial for verifying not only the optical data captured by satellites and airborne systems but also to validate the flux measurements that track ecosystem-atmosphere gas exchanges, the "breathing of the planet" critical to regulating our atmosphere and climate. There is a need for readily available systems for the management, processing and analysis of field spectrometry data. In this paper we present SALSA, a software system for the management, processing and analysis of field spectrometry data that also provides a platform for linking optical data to flux measurements. SALSA is demonstrated using real data collected from multiple research sites.
    Proceedings of the 24th international conference on Scientific and Statistical Database Management; 06/2012
  • Yi Song, Panagiotis Karras, Qian Xiao, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper is motivated by the recognition of the need for a finer grain and more personalized privacy in data publication of social networks. We propose a privacy protection scheme that not only prevents the disclosure of identity of users but also the disclosure of selected features in users' profiles. An individual user can select which features of her profile she wishes to conceal. The social networks are modeled as graphs in which users are nodes and features are labels. Labels are denoted either as sensitive or as non-sensitive. We treat node labels both as background knowledge an adversary may possess, and as sensitive information that has to be protected. We present privacy protection algorithms that allow for graph data to be published in a form such that an adversary who possesses information about a node's neighborhood cannot safely infer its identity and its sensitive labels. To this aim, the algorithms transform the original graph into a graph in which nodes are sufficiently indistinguishable. The algorithms are designed to do so while losing as little information and while preserving as much utility as possible. We evaluate empirically the extent to which the algorithms preserve the original graph's structure and properties. We show that our solution is effective, efficient and scalable while offering stronger privacy guarantees than those in previous research.
    Proceedings of the 24th international conference on Scientific and Statistical Database Management; 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Internet, and social media in particular, are frequently credited in public discourse with being instrumental for the development and coordination of various contemporary social movements. We examine the evolution of Facebook activity in relation to the movement of the Greek Indignados of 2011, by collecting the electronic traces of their public communications on Facebook pages for a period of 8 months. We analyze the resulting bipartite graphs consisting of users posting to pages, using social network analysis. We reveal some of the dynamics of structural properties of the network over time and explain what these mean for the configuration of networked publics on social network sites. We conclude that the very early stages of activity are essential in determining this configuration, because users converge quickly and exclusively on a small number of pages. When gradually activity is reduced, the reduction is strongest in the most active users and the most popular pages, but when activity resumes, users return to the same pages. We discuss implications for the organization of collective action on social network sites.
    Proceedings of the 3rd Annual ACM Web Science Conference; 06/2012
  • Ruiming Tang, Huayu Wu, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: XML is semi-structured. It can be used to annotate unstructured data, to represent structured data and almost anything in-between. Yet, it is unclear how to formally characterize, yet to quantify, structured-ness of XML. In this paper we propose and evaluate entropy-based metrics for XML structured-ness. The metrics measure the structural uniformity of path and subtrees, respectively. We empirically study the correlation of these metrics with real and synthetic data sets.
    Proceedings of the 2011 international conference on Web-Age Information Management; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The International Maritime Organization (IMO) requires a majority of cargo and passenger ships to use the Automatic Identification System (AIS) for navigation safety and traffic control. Distributing live AIS data on the Internet can offer a global view based on ships' status for both operational and analytical purposes to port authorities, shipping and insurance companies, cargo owners and ship captains and other stakeholders. Yet, uncontrolled, this distribution can seriously undermine navigation safety and security and the privacy of the various stakeholders. In this paper we present ASSIST, a system prototype based on our recently proposed access control framework, to protect data streams from unauthorized access. We demonstrate the effectiveness of the system in a real scenario with real AIS data streams.
    19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, November 1-4, 2011, Chicago, IL, USA, Proceedings; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Broadcasting is an elementary problem in wireless networks. Energy — efficient broadcasting is important, e.g., to coordinate the distributed computing operations by sending periodic messages in a network of Automatic Identification System installed on energy constrained maritime lighthouses. To that end logical tree topologies that are based on Connected Dominating Sets have been proposed vigorously in the literature. In this paper we present Biased Shortest Path Tree (BISPT), a new logical tree topology for efficient broadcasting in wireless networks. In simulations we find that BISPT outperforms state-of-the-art solutions.
    01/2011;
  • Deepen Doshi, Baljeet Malhotra, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: 90% of the world trade is reportedly carried by sea. The analysis of shipping networks therefore can create invaluable insight into global trade. In this paper we study the appropriateness of various graph centrality measures to rate, compare and rank ports from various perspectives of a shipping network. In particular, we illustrate the potential of such analysis on the example of a shipping network constructed from the schedules, readily available on the World Wide Web, of one arbitrarily chosen shipping company.
    iiWAS'2011 - The 13th International Conference on Information Integration and Web-based Applications and Services, 5-7 December 2011, Ho Chi Minh City, Vietnam; 01/2011
  • Source
    Ruiming Tang, Huayu Wu, Sadegh Nobari, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: Probabilistic XML is a hierarchical data model capturing uncertainty of both value and structure. The ability to compute the similarity between an XML document and a probabilistic XML document is a building block of many applications involving querying, comparison, alignment and classification, for instance. The new challenge in efficiently computing such similarity is the multiplicity of the possible worlds represented by a probabilistic XML document. We devise and discuss an algorithm for the efficient computation of the similarity between an XML document and a probabilistic XML document. We empirically and comparatively evaluate the performance of the algorithm and its variants.
    Database and Expert Systems Applications - 22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The rapidly increasing number of sensors and devices as well as the coming of age of cloud computing are fuelling the need for real-time stream data management tools. In a world that is highly submerged in data, data analytics and higher forms of data exploitation are fundamental to most decision making processes. However, enabling such technology requires several factors related to security, privacy, performance and costing to be addressed. In this paper, we discuss the various challenges in greater detail and propose our vision of a Privacy-aware Data Stream Cloud architecture that enables secure, privacy-preserving data analytics services.
    IJWGS. 01/2011; 7:246-267.
  • Source
    Xuesong Lu, Stéphane Bressan
    [Show abstract] [Hide abstract]
    ABSTRACT: The graphs that arise from concrete applications seem to correspond to models with prescribed degree sequences. We present two algorithms for the uniform random generation of graphic sequences. We prove their correctness. We empirically evaluate their performance. To our knowledge these algorithms are the first non trivial algorithms proposed for this task. The algorithms that we propose are Markov chain Monte Carlo algorithms. Our contribution is the original design of the Markov chain and the empirical evaluation of mixing time.
    Database Systems for Advanced Applications - 16th International Conference, DASFAA 2011, Hong Kong, China, April 22-25, 2011, Proceedings, Part I; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Today, several database applications call for the generation of random graphs. A fundamental, versatile random graph model adopted for that purpose is the Erdős-Rényi Γv,p model. This model can be used for directed, undirected, and multipartite graphs, with and without self-loops; it induces algorithms for both graph generation and sampling, hence is useful not only in applications necessitating the generation of random structures but also for simulation, sampling and in randomized algorithms. However, the commonly advocated algorithm for random graph generation under this model performs poorly when generating large graphs, and fails to make use of the parallel processing capabilities of modern hardware. In this paper, we propose PPreZER, an alternative, data parallel algorithm for random graph generation under the Erdős-Rényi model, designed and implemented in a graphics processing unit (GPU). We are led to this chief contribution of ours via a succession of seven intermediary algorithms, both sequential and parallel. Our extensive experimental study shows an average speedup of 19 for PPreZER with respect to the baseline algorithm.
    EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: You are on Facebook or you are out. Of course, this assessment is controversial and its rationale arguable. It is nevertheless not far, for many of us, from the reason behind our joining social media and publishing and sharing details of our professional and private lives. Not only the personal details we may reveal but also the very structure of the networks themselves are sources of invaluable information for any organization wanting to understand and learn about social groups, their dynamics and their members. These organizations may or may not be benevolent. It is therefore important to devise, design and evaluate solutions that guarantee some privacy. One approach that attempts to reconcile the different stakeholders' requirement is the publication of a modified graph. The perturbation is hoped to be sufficient to protect members' privacy while it maintains sufficient utility for analysts wanting to study the social media as a whole. It is necessarily a compromise. In this paper we try and empirically quantify the inevitable trade-off between utility and privacy. We do so for one state-of-the-art graph anonymization algorithm that protects against most structural attacks, the k-automorphism algorithm. We measure several metrics for a series of real graphs from various social media before and after their anonymization under various settings.
    iiWAS'2011 - The 13th International Conference on Information Integration and Web-based Applications and Services, 5-7 December 2011, Ho Chi Minh City, Vietnam; 01/2011

Publication Stats

802 Citations
7.96 Total Impact Points

Institutions

  • 2004–2013
    • National University of Singapore
      • School of Computing
      Tumasik, Singapore
  • 2009
    • University of Science Malaysia
      Nibong Tepal, Pulau Pinang, Malaysia
  • 2005
    • Arizona State University
      Phoenix, Arizona, United States
  • 1997–2004
    • Massachusetts Institute of Technology
      • MIT Sloan School of Management
      Cambridge, Massachusetts, United States