Stéphane Bressan

National University of Singapore, Tumasik, Singapore

Are you Stéphane Bressan?

Claim your profile

Publications (131)17 Total impact

  • Debabrota Basu · Pierre Senellart · Stephane Bressan · Qian Lin · Weidong Chen · Zihong Yuan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a learning approach to adaptive performance tuning of database applications. The objective is to validate the opportunity to devise a tuning strategy that does not need prior knowledge of a cost model. Instead, the cost model is learned through reinforcement learning. We instantiate our approach to the use case of index tuning. We model the execution of queries and updates as a Markov decision process whose states are database configurations, actions are configuration changes, and rewards are functions of the cost of configuration change and query and update evaluation. During the reinforcement learning process, we face two important challenges: not only the unavailability of a cost model, but also the size of the state space. To address the latter, we devise strategies to prune the state space, both in the general case and for the use case of index tuning. We empirically and comparatively evaluate our approach on a standard OLTP dataset. We show that our approach is competitive with state-of-the-art adaptive index tuning, which is dependent on a cost model.
    26th International Conference on Database and Expert Systems Applications, DEXA 2015, Valencia, Spain; 09/2015
  • Source
    Tian Huang · Yongxin Zhu · Yafei Wu · Stéphane Bressan · Gillian Dobbie ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Virtual machines (VM) offer simple and practical mechanisms to address many of the manageability problems of leveraging heterogeneous computing resources. VM live migration is an important feature of virtualization in cloud computing: it allows administrators to transparently tune the performance of the computing infrastructure. However, VM live migration may open the door to security threats. Classic anomaly detection schemes such as Local Outlier Factors (LOF) fail in detecting anomalies in the process of VM live migration. To tackle such critical security issues, we propose an adaptive scheme that mines data from the cloud infrastructure in order to detect abnormal statistics when VMs are migrated to new hosts. In our scheme, we extend classic Local Outlier Factors (LOF) approach by defining novel dimension reasoning (DR) rules as DR-LOF to figure out the possible sources of anomalies. We also incorporate Symbolic Aggregate ApproXimation (SAX) to enable timing information exploration that LOF ignores. In addition, we implement our scheme with an adaptive procedure to reduce chances of performance instability. Compared with LOF that fails in detecting anomalies in the process of VM live migration, our scheme is able not only to detect anomalies but also to identify their possible sources, giving cloud computing operators important clues to pinpoint and clear the anomalies. Our scheme further outperforms other classic clustering tools in WEKA (Waikato Environment for Knowledge Analysis) with higher detection rates and lower false alarm rate. Our scheme would serve as a novel anomaly detection tool to improve security framework in VM management for cloud computing.
    Future Generation Computer Systems 06/2015; DOI:10.1016/j.future.2015.06.005 · 2.79 Impact Factor
  • Yi Song · Stéphane Bressan · Gillian Dobbie ·
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose algorithms for the detection of disjoint and overlapping communities in networks. The algorithms exploit both the degree and clustering coefficient of vertices as these metrics characterize dense connections, which we hypothesize as being indicative of communities. Each vertex independently seeks the community to which it belongs, by visiting its neighboring vertices and choosing its peers on the basis of their degrees and clustering coefficients. The algorithms are intrinsically data parallel. We devise a version for Graphics Processing Unit (GPU). We empirically evaluate the performance of our methods. We measure and compare their efficiency and effectiveness to several state-of-the-art community detection algorithms. Effectiveness is quantified by metrics, namely, modularity, conductance, internal density, cut ratio, weighted community clustering and normalized mutual information. Additionally, average community size and community size distribution are measured. Efficiency is measured by the running time. We show that our methods are both effective and efficient. Meanwhile, the opportunity to parallelize our algorithm yields an efficient solution to the community detection problem.
    Transactions on Large-Scale Data- and Knowledge-Centered Systems XVIII, 01/2015: pages 153-179;
  • Ruiming Tang · Antoine Amarilli · Pierre Senellart · Stéphane Bressan ·

  • Kaifeng Jiang · Dongxu Shao · Stéphane Bressan · Thomas Kister · Kian-Lee Tan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The pervasiveness of location-acquisition technologies has made it possible to collect the movement data of individuals or vehicles. However, it has to be carefully managed to ensure that there is no privacy breach. In this paper, we investigate the problem of publishing trajectory data under the differential privacy model. A straightforward solution is to add noise to a trajectory - this can be done either by adding noise to each coordinate of the position, to each position of the trajectory, or to the whole trajectory. However, such naive approaches result in trajectories with zigzag shapes and many crossings, making the published trajectories of little practical use. We introduce a mechanism called SDD (Sampling Distance and Direction), which is ε-differentially private. SDD samples a suitable direction and distance at each position to publish the next possible position. Numerical experiments conducted on real ship trajectories demonstrate that our proposed mechanism can deliver ship trajectories that are of good practical utility.
    Proceedings of the 25th International Conference on Scientific and Statistical Database Management; 07/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Efficient spatial joins are pivotal for many applications and particularly important for geographical information systems or for the simulation sciences where scientists work with spatial models. Past research has primarily focused on disk-based spatial joins; efficient in-memory approaches, however, are important for two reasons: a) main memory has grown so large that many datasets fit in it and b) the in-memory join is a very time-consuming part of all disk-based spatial joins. In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low. Our results show that TOUCH outperforms known in-memory spatial-join algorithms as well as in-memory implementations of disk-based join approaches. In particular, it has a one order of magnitude advantage over the memory-demanding state of the art in terms of number of comparisons (i.e., pairwise object comparisons), as well as execution time, while it is two orders of magnitude faster when compared to approaches with a similar memory footprint. Furthermore, TOUCH is more scalable than competing approaches as data density grows.
    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The database group has worked on a wide range of research at the National University of Singapore (NUS), ranging from traditional database technology to more advanced database technology and novel database utilities. The database group has been developing efficient cloud computing platforms for large-scale services, and Big Data management and analytics using commodity hardware. One of its goals is to allow users of MapReduce-based systems to keep the programming model of the MapReduce framework, while empowering them with data management functionalities at an acceptable performance. The database group has also developed query processing engine under the MapReduce framework. The group's proposed MapReduce-based similarity (kNN) join exploits Voronoi diagram to minimize the number of objects to be sent to the reducer node to minimize computation and communication overheads.
    ACM SIGMOD Record 06/2013; 42(2):46-51. DOI:10.1145/2503792.2503803 · 1.05 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The widespread usage of random graphs has been highlighted in the context of database applications for several years. This because such data structures turn out to be very useful in a large family of database applications ranging from simulation to sampling, from analysis of complex networks to study of randomized algorithms, and so forth. Amongst others, Erdős–Rényi Γv,pΓv,p is the most popular model to obtain and manipulate random graphs. Unfortunately, it has been demonstrated that classical algorithms for generating Erdős–Rényi based random graphs do not scale well in large instances and, in addition to this, fail to make use of the parallel processing capabilities of modern hardware. Inspired by this main motivation, in this paper we propose and experimentally assess a novel parallel algorithm for generating random graphs under the Erdős–Rényi model that is designed and implemented in a Graphics Processing Unit (GPU), called PPreZER. We demonstrate the nice amenities due to our solution via a succession of several intermediary algorithms, both sequential and parallel, which show the limitations of classical approaches and the benefits due to the PPreZER algorithm. Finally, our comprehensive experimental assessment and analysis brings to light a relevant average speedup gain of PPreZER over baseline algorithms.
    Journal of Parallel and Distributed Computing 03/2013; 73(3):303–316. DOI:10.1016/j.jpdc.2012.09.010 · 1.18 Impact Factor
  • Source
    Antoine Veillard · Stephane Bressan · Daniel Racoceanu ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The extraction of nuclei from Haematoxylin and Eosin (H&E) stained biopsies present a particularly steep challenge in part due to the irregularity of the high-grade (most malignant) tumors. To your best knowledge, although some existing solutions perform adequately with relatively predictable low-grade cancers, solutions for the problematic high-grade cancers have yet to be proposed. In this paper, we propose a method for the extraction of cell nuclei from H&E stained biopsies robust enough to deal with the full range of histological grades observed in daily clinical practice. The robustness is achieved by combining a wide range of information including color, texture, scale and geometry in a multi-stage, Support Vector Machine (SVM) based framework to replace the original image with a new, probabilistic image modality with stable characteristics. The actual extraction of the nuclei is performed from the new image using Mark Point Processes (MPP), a state-of-the-art stochastic method. An empirical evaluation on clinical data provided and annotated by pathologists shows that our method greatly improves detection and extraction results, and provides a reliable solution with high grade cancers. Moreover, our method based on machine learning can easily adapt to specific clinical conditions. In many respects, our method contributes to bridging the gap between the computer vision technologies and their actual clinical use for breast cancer grading.
    Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01; 12/2012
  • Source
    Antoine Veillard · Daniel Racoceanu · Stephane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The incorporation of prior-knowledge into support vector machines (SVM) in order to compensate for inadequate training data has been the focus of previous research works and many found a kernel-based approach to be the most appropriate. However, they are more adapted to deal with broad domain knowledge (e.g. "sets are invariant to permutations of the elements") rather than task-specific properties (e.g. "the weight of a person is cubically related to her height"). In this paper, we present the partially RBF (pRBF) kernels, our original framework for the incorporation of prior-knowledge about correlation patterns between specific features and the output label. pRBF kernels are based upon the tensor-product combination of the standard radial basis function (RBF) kernel with more specialized kernels and provide a natural way for the incorporation of a commonly available type of prior-knowledge. In addition to a theoretical validation of our framework, we propose an detailed empirical evaluation on real-life biological data which illustrates its ease-of-use and effectiveness. Not only pRBF kernels were able to improve the learning results in general but they also proved to perform particularly well when the training data set was very small or strongly biased, significantly broadening the field of application of SVMs.
    2012 Eleventh International Conference on Machine Learning and Applications (ICMLA); 12/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The proliferation of online social networks has created intense interest in studying their nature and revealing information of interest to the end user. At the same time, such revelation raises privacy concerns. Existing research addresses this problem following an approach popular in the database community: a model of data privacy is defined, and the data is rendered in a form that satisfies the constraints of that model while aiming to maximize some utility measure. Still, these is no consensus on a clear and quantifiable utility measure over graph data. In this paper, we take a different approach: we define a utility guarantee, in terms of certain graph properties being preserved, that should be respected when releasing data, while otherwise distorting the graph to an extend desired for the sake of confidentiality. We propose a form of data release which builds on current practice in social network platforms: A user may want to see a subgraph of the network graph, in which that user as well as connections and affiliates participate. Such a snapshot should not allow malicious users to gain private information, yet provide useful information for benevolent users. We propose a mechanism to prepare data for user view under this setting. In an experimental study with real data, we demonstrate that our method preserves several properties of interest more successfully than methods that randomly distort the graph to an equal extent, while withstanding structural attacks proposed in the literature.
    Proceedings of the 21st ACM international conference on Information and knowledge management; 10/2012
  • Deepen Doshi · Baljeet Malhotra · Stéphane Bressan · Jasmine Siu Lee Lam ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Shipping plays a vital role as trade facilitator in providing cost-efficient transportation. The International Maritime Organisation (IMO) reports that over 90% of the world trade volume is carried by merchant ships. The analysis of shipping networks therefore can create invaluable insight into global trade. In this paper we study the appropriateness of various graph centrality measures to rate, compare and rank ports from various perspectives of global shipping networks. In particular, we illustrate the potential of such analysis on the example of shipping networks constructed from the schedules, readily available on the World Wide Web, of six shipping companies that transport 35-40% of the total volume traded (in TEUs) worldwide.
    International Journal of Business Intelligence and Data Mining 10/2012; 7(3):186-202. DOI:10.1504/IJBIDM.2012.049554
  • Source
    Sadegh Nobari · Thanh-Tung Cao · Panagiotis Karras · Stéphane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The proliferation of data in graph form calls for the development of scalable graph algorithms that exploit parallel processing environments. One such problem is the computation of a graph's minimum spanning forest (MSF). Past research has proposed several parallel algorithms for this problem, yet none of them scales to large, high-density graphs. In this paper we propose a novel, scalable, parallel MSF algorithm for undirected weighted graphs. Our algorithm leverages Prim's algorithm in a parallel fashion, concurrently expanding several subsets of the computed MSF. Our effort focuses on minimizing the communication among different processors without constraining the local growth of a processor's computed subtree. In effect, we achieve a scalability that previous approaches lacked. We implement our algorithm in CUDA, running on a GPU and study its performance using real and synthetic, sparse as well as dense, structured and unstructured graph data. Our experimental study demonstrates that our algorithm outperforms the previous state-of-the-art GPU-based MSF algorithm, while being several orders of magnitude faster than sequential CPU-based algorithms.
    Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012; 09/2012
  • Baljeet Malhotra · John A. Gamon · Stéphane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Field spectrometry is emerging as an important tool in the study of the dynamics of the biosphere and atmosphere. Large amounts of data are now collected from spectrometers mounted on towers, robotic trams and other platforms. These data are crucial for verifying not only the optical data captured by satellites and airborne systems but also to validate the flux measurements that track ecosystem-atmosphere gas exchanges, the "breathing of the planet" critical to regulating our atmosphere and climate. There is a need for readily available systems for the management, processing and analysis of field spectrometry data. In this paper we present SALSA, a software system for the management, processing and analysis of field spectrometry data that also provides a platform for linking optical data to flux measurements. SALSA is demonstrated using real data collected from multiple research sites.
    Proceedings of the 24th international conference on Scientific and Statistical Database Management; 06/2012
  • Xuesong Lu · Stéphane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: A recurrent challenge for modern applications is the processing of large graphs. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis and other data mining tasks. For such purposes adequate sampling techniques must be devised. We are interested, in this paper, in the uniform random sampling of a connected subgraph from a graph. We require that the sample contains a prescribed number of vertices. The sampled graph is the corresponding induced graph. We devise, present and discuss several algorithms that leverage three different techniques: Rejection Sampling, Random Walk and Markov Chain Monte Carlo. We empirically evaluate and compare the performance of the algorithms. We show that they are effective and efficient but that there is a trade-off, which depends on the density of the graphs and the sample size. We propose one novel algorithm, which we call Neighbour Reservoir Sampling (NRS), that very successfully realizes the trade-off between effectiveness and efficiency.
    Proceedings of the 24th international conference on Scientific and Statistical Database Management; 06/2012
  • Source
    Yi Song · Panagiotis Karras · Qian Xiao · Stéphane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper is motivated by the recognition of the need for a finer grain and more personalized privacy in data publication of social networks. We propose a privacy protection scheme that not only prevents the disclosure of identity of users but also the disclosure of selected features in users' profiles. An individual user can select which features of her profile she wishes to conceal. The social networks are modeled as graphs in which users are nodes and features are labels. Labels are denoted either as sensitive or as non-sensitive. We treat node labels both as background knowledge an adversary may possess, and as sensitive information that has to be protected. We present privacy protection algorithms that allow for graph data to be published in a form such that an adversary who possesses information about a node's neighborhood cannot safely infer its identity and its sensitive labels. To this aim, the algorithms transform the original graph into a graph in which nodes are sufficiently indistinguishable. The algorithms are designed to do so while losing as little information and while preserving as much utility as possible. We evaluate empirically the extent to which the algorithms preserve the original graph's structure and properties. We show that our solution is effective, efficient and scalable while offering stronger privacy guarantees than those in previous research.
    Proceedings of the 24th international conference on Scientific and Statistical Database Management; 06/2012
  • Xuesong Lu · Giorgos Cheliotis · Xiyue Cao · Yi Song · Stéphane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The Internet, and social media in particular, are frequently credited in public discourse with being instrumental for the development and coordination of various contemporary social movements. We examine the evolution of Facebook activity in relation to the movement of the Greek Indignados of 2011, by collecting the electronic traces of their public communications on Facebook pages for a period of 8 months. We analyze the resulting bipartite graphs consisting of users posting to pages, using social network analysis. We reveal some of the dynamics of structural properties of the network over time and explain what these mean for the configuration of networked publics on social network sites. We conclude that the very early stages of activity are essential in determining this configuration, because users converge quickly and exclusively on a small number of pages. When gradually activity is reduced, the reduction is strongest in the most active users and the most popular pages, but when activity resumes, users return to the same pages. We discuss implications for the organization of collective action on social network sites.
    Proceedings of the 3rd Annual ACM Web Science Conference; 06/2012
  • Source
    Xuesong Lu · Yi Song · Stéphane Bressan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: Liu and Terzi proposed the notion of k -degree anonymity to address the problem of identity anonymization in graphs. A graph is k -degree anonymous if and only if each of its vertices has the same degree as that of, at least, k-1 other vertices. The anonymization problem is to transform a non-k -degree anonymous graph into a k -degree anonymous graph by adding or deleting a minimum number of edges. Liu and Terzi proposed an algorithm that remains a reference for k -degree anonymization. The algorithm consists of two phases. The first phase anonymizes the degree sequence of the original graph. The sec-ond phase constructs a k -degree anonymous graph with the anonymized degree sequence by adding edges to the original graph. In this work, we propose a greedy algorithm that anonymizes the original graph by simultaneously adding edges to the original graph and anonymizing its degree sequence. We thereby avoid testing the realizability of the degree sequence, which is a time consuming operation. We empirically and com-paratively evaluate our new algorithm. The experimental results show that our algorithm is indeed more efficient and more effective than the algorithm proposed by Liu and Terzi on large real graphs.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Broadcasting is an elementary problem in wireless networks. Energy — efficient broadcasting is important, e.g., to coordinate the distributed computing operations by sending periodic messages in a network of Automatic Identification System installed on energy constrained maritime lighthouses. To that end logical tree topologies that are based on Connected Dominating Sets have been proposed vigorously in the literature. In this paper we present Biased Shortest Path Tree (BISPT), a new logical tree topology for efficient broadcasting in wireless networks. In simulations we find that BISPT outperforms state-of-the-art solutions.
    11/2011; DOI:10.1109/PCCC.2011.6108114
  • Wee Siong Ng · Markus Kirchberg · Stéphane Bressan · Kian-Lee Tan ·
    [Show abstract] [Hide abstract]
    ABSTRACT: The rapidly increasing number of sensors and devices as well as the coming of age of cloud computing are fuelling the need for real-time stream data management tools. In a world that is highly submerged in data, data analytics and higher forms of data exploitation are fundamental to most decision making processes. However, enabling such technology requires several factors related to security, privacy, performance and costing to be addressed. In this paper, we discuss the various challenges in greater detail and propose our vision of a Privacy-aware Data Stream Cloud architecture that enables secure, privacy-preserving data analytics services.
    International Journal of Web and Grid Services 11/2011; 7(3):246-267. DOI:10.1504/IJWGS.2011.043530 · 0.76 Impact Factor

Publication Stats

973 Citations
17.00 Total Impact Points


  • 2001-2014
    • National University of Singapore
      • • Department of Computer Science
      • • School of Computing
      Tumasik, Singapore
  • 2009
    • University of Science Malaysia
      Nibong Tepal, Pulau Pinang, Malaysia
  • 1997-2004
    • Massachusetts Institute of Technology
      • MIT Sloan School of Management
      Cambridge, Massachusetts, United States