Ettore Merlo

Centre hospitalier de l'Université de Montréal (CHUM), Montréal, Quebec, Canada

Are you Ettore Merlo?

Claim your profile

Publications (130)45.52 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Software clone detection techniques identify fragments of code that share some level of syntactic similarity. In this study, we investigate security-sensitive clone clusters: clusters of syntactically similar fragments of code that are protected by some privileges. From a security perspective, security-sensitive clone clusters can help reason about the implemented security model: given syntactically similar fragments of code, it is expected that they are protected by similar privileges. We hypothesize that clones that violate this assumption, defined as security-discordant clones, are likely to reveal weaknesses and flaws in access control models. In order to characterize security-discordant clones, we investigated two of the largest and most popular open-source PHP applications: Joomla! and Moodle, with sizes ranging from hundred thousands to more than a million lines of code. Investigation of security-discordant clone clusters in these systems revealed several previously undocumented, recurring, and application-independent security weaknesses. Moreover, security-discordant clones also revealed four, previously unreported, security flaws. Results also show how these flaws were revealed through the investigation of as little as 2% of the code base. Distribution of weaknesses and flaws between the two systems is investigated and discussed. Potential extensions to this exploratory work are also presented.
    Proceedings of the 29th Annual Computer Security Applications Conference; 12/2013
  • F. Gauthier, E. Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: Access control models implement mechanisms to restrict access to sensitive data from unprivileged users. Access controls typically check privileges that capture the semantics of the operations they protect. Semantic smells and errors in access control models stem from privileges that are partially or totally unrelated to the action they protect. This paper presents a novel approach, partly based on static analysis and information retrieval techniques, for the automatic detection of semantic smells and errors in access control models. Investigation of the case study application revealed 31 smells and 2 errors. Errors were reported to developers who quickly confirmed their relevance and took actions to correct them. Based on the obtained results, we also propose three categories of semantic smells and errors to lay the foundations for further research on access control smells in other systems and domains.
    Software Engineering (ICSE), 2013 35th International Conference on; 01/2013
  • T. Lavoie, E. Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on the applicability of clone detectors for system evolution understanding. Specifically, it is a case study of Firefox for which the development release cycle changed from a slow release cycle to a fast release cycle two years ago. Since the transition of the release cycle, three times more versions of the software were deployed. To understand whether or not the changes between the newer versions are as significant as the changes in the older versions, we measured the similarity between consecutive versions.We analyzed 82MLOC of C/C++ code to compute the overall change distribution between all existing major versions of Firefox. The results indicate a significant decrease in the overall difference between many versions in the fast release cycle. We discuss the results and highlight how differently the versions have evolved in their respective release cycle. We also relate our results with other results assessing potential changes in the quality of Firefox. We conclude the paper by raising questions on the impact of a fast release cycle.
    Software Clones (IWSC), 2013 7th International Workshop on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents results from an experience of transferring the CLAN clone detection technology into a telecommunication industrial setting. Eleven proprietary systems have been analyzed for a total of about 94 MLOC of C/C++ and Java source code. The characteristics of the analyzed systems together with a description of the Web portal that is used as an interface to the clone analysis environment is described. Reported results include figures and diagrams about clone frequencies, types, and similarity distributions. Processing times including parsing, clone clustering, and Dynamic Programming visualisation are presented. A discussion about lesson learned and future research work is also presented from an industrial point of view for real life practical applications of clone detection.
    Software Clones (IWSC), 2013 7th International Workshop on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Blood pressure (BP) is a dynamic phenotype that varies rapidly to adjust to changing environmental conditions. Standing upright is a recent evolutionary trait, and genetic factors that influence postural adaptations may contribute to BP variability. We studied the effect of posture on the genetics of BP and intermediate BP phenotypes. Methods and results: 384 sip-pairs in 64 sib-ships from families ascertained by early-onset hypertension and dyslipidemia were included. Blood pressure, three hemodynamic and seven neuroendocrine intermediate BP phenotypes were measured while lying supine and standing upright. The effect of posture on estimates of heritability and genetic covariance was investigated in full pedigrees. Linkage was conducted on 196 candidate genes by sib-pair analyses, and empirical estimates of significance were obtained. A permutation algorithm was implemented to study the postural effect on linkage. ADRA1A, APO,CAST, CORIN, CRHR1, EDNRB, FGF2, GC, GJA1, KCNB2, MMP3, NPY, NR3C2, PLN, TGFBR2, TNFRSF6 and TRHR showed evidence of linkage with any phenotype in the supine position and not upon standing, whereas AKR1B1, CD36, EDNRA, F5, MMP9, PKD2, PON1, PPARG, PPARGC1A, PRKCA and RET were specifically linked to standing phenotypes. Genetic profiling was undertaken to show genetic interactions among intermediate BP phenotypes and genes specific to each posture. Conclusions: Important genetic components of BP are missed by performing genetic studies exclusively in a single posture. Supine and standing BPs have distinct genetic signatures. Standardized maneuvers influence the results of genetic investigations into BP, thus reflecting its dynamic regulation.
    Physiological Genomics 12/2012; · 2.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Automatic Query generators have been shown to be effective tools for software testing. For the most part, they have been used in system testing for the database as a whole or to generate specific queries to test specific features with not much randomness. In this work we explore the problems encountered when using a genetic algorithm to generate SQL for testing a large database system. General random SQL generation that tests the database system as a whole using genetic algorithms is relatively simple. One would need to generate millions of test cases to have a reasonable chance of hitting specific combinations of features. In order to optimize the testing, one needs to generate targeted SQL queries that narrow the testing to specific feature areas and feature combinations but yet preserve a certain amount of randomness and exploit the strength of a genetic algorithm. To do this effectively, the test generator needs to be guided so that it does not stray too much from the goals of the more targeted test requirement. In this work we explore a genetic algorithm approach to generate test queries that exercise target sub-sequences of features. Genetic algorithm parameters such as genome representation, reproduction, fitness evaluation, and selection are described. Preliminary results obtained comparing the presented approach with a random query generator are presented and discussed. We further present the DB2 SQL Query Optimizer, the application which we are using as a case study and target queries that go through certain optimization rule sequences. This application is larger and more complex in terms of code size and data input complexity then software previously used for studying test data generation.
    05/2012;
  • Thierry Lavoie, Ettore Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an original clone detection technique which is an accurate approximation of the Levenshtein distance. It uses groups of tokens extracted from source code called windowed-tokens. From these, frequency vectors are then constructed and compared with the Manhattan distance in a metric tree. The goal of this new technique is to provide a very high precision clone detection technique while keeping a high recall. Precision and recall measurement is done with respect to the Levenshtein distance. The testbench is a large scale open source software. The collected results proved the technique to be fast, simple, and accurate. Finally, this article presents further research opportunities.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: During the re-engineering of legacy software systems, a good knowledge of the history of past modifications on the system is important to recover the design of the system and transfer its functionalities. In the absence of a reliable revision history, development teams often rely on system experts to identify hidden history and recover software design. In this paper, we propose a new technique to infer the history of repository file modifications of a software system using only past released versions of the system. The proposed technique relies on nearest-neighbor clone detection using the Manhattan distance. We performed an empirical evaluation of the technique using Tomcat, JHotDraw and Adempiere SVN information as our oracle of file operations, and obtained an average precision of 97% and an average recall of 98%. Our evaluation also highlighted the phenomena of implicit Moves, which are, Moves between a system's versions, that are not recorded in the SVN repository. In the absence of revision history and software experts, development teams can make use of the proposed technique during the re-engineering of their legacy systems.
    Reverse Engineering (WCRE), 2012 19th Working Conference on; 01/2012
  • F. Gauthier, E. Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: Access control vulnerabilities in web applications are on the rise. In its 2010 "Top 10 Most Critical Web Applications Security Risks", the OWASP reported that the prevalence of access control vulnerabilities in web applications increased compared to 2007. However, in contrast to SQL injection and cross-site scripting flaws, access control vulnerabilities comparatively received much less attention from the research community. This paper presents ACMA (Access Control Model Analyzer), a model checking-based tool for the detection of access control vulnerabilities in PHP applications. The core of ACMA uses a lightweight model checker to detect the privileges that are enforced at each statement of an application. Based on this information, ACMA can detect several types of access control vulnerabilities: from forced browsing vulnerabilities to faulty access controls. We show how, when compared to the state of the art, ACMA achieves advantageously comparable results with accelerations up to 890 times faster. Moreover, contrary to the state of the art, ACMA scales up to medium-large applications with large access control models, as shown by the analysis of Moodle, a 400,000+ LOC application counting more than 200 distinct privileges. Results show that ACMA is fast, precise and scalable making it a practical tool for the detection of access control vulnerabilities in real-world applications. A discussion about further extensions to ACMA is also presented.
    Reverse Engineering (WCRE), 2012 19th Working Conference on; 01/2012
  • F. Gauthier, E. Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: Web applications manage increasingly large amounts of sensitive information and often need to implement access control (AC) models. However, documentation about the implemented AC model is often sparse and few, if no tool exists to support AC model investigation. Based on the results of a previous study, we show how formal concept analysis (FCA) can support the understanding and visualization of reverse-engineered AC models. Results of applying FCA to Moodle, a medium-sized (625 473 LOC) Web application, are presented and discussed. We show how FCA enhances the overall comprehension of reverse-engineered AC models and sheds light on previously unknown features of Moodle's AC model.
    Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on; 01/2012
  • F. Gauthier, E. Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present novel algorithms for the propagation of pattern-based properties in PHP applications. Intuitively, pattern-based properties designate those properties that are intrinsically associated to syntactic patterns in the source code. Security checks in access control models are an example of pattern-based properties. At the source code level, permissions are typically verified with stereotyped constructs, called security checks, that can be detected with syntactic patterns. Depending on the program, pattern-based properties can be a liased to variables that are propagated through the application. In that context, support from data-flow approaches is needed to track the propagation of patterns through the application. In the context of this paper, we focus on the alias-aware propagation of security checks through PHP applications. Specifically, we investigated the propagation of security checks in 8 PHP applications that implement access control models. We show how, using the Data log language, one can implement conceptually complex data-flow algorithms in an incremental, intuitive and compact manner. From the results perspective, we show how our algorithm identifies security checks and security check a liased variables in a precise way. The reported false positive rate varies between 0% and 4% for the investigated applications.
    Source Code Analysis and Manipulation (SCAM), 2012 IEEE 12th International Working Conference on; 01/2012
  • D. Letarte, F. Gauthier, E. Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: Web sites are often a mixture of static sites and programs that integrate relational databases as a back-end. As they evolve to meet ever-changing user needs, new versions of programs, interactions and functionalities may be added and existing ones may be removed or modified. Web sites require configuration and programming attention to assure security, confidentiality, and trust of the published information. During evolution of Web software, from one version to the next one, security properties may change and possible changes may include new flaws or corrections. Changes to security properties, including access control privileges, can be monitored by observing and analyzing changes between security models extracted from different versions of an application. This paper defines Property Satisfaction Profiles (PSP) as the satisfaction values of properties computed on the extracted models. This paper presents also an investigation of the evolution of the changes in the PSP computed on security models of different versions of a Web application. Model extraction and PSP computation can be performed in linear time on one version. Comparison between two versions is also linear and practical performance is fast. This paper reports results about experiments performed on 31 versions of phpBB, that is a publicly available bulletin board written in PHP. Version 1.0.0 (9547 LOC) to version 2.0.22 (40663 LOC) have been considered as a case study. Results show that the proposed approach can be used to observe and monitor the evolution of PSP in successive versions of the same software package. Suggestions for further research are also presented.
    Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on; 04/2011
  • Ninth Annual Conference on Privacy, Security and Trust, PST 2011, 19-21 July, 2011, Montreal, Québec, Canada; 01/2011
  • Thierry Lavoie, Ettore Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: Clone detection techniques quality and performance evaluation require a system along with its clone oracle, that is a reference database of all accepted clones in the investigated system. Many challenges, including finding an adequate clone definition and scalability to industrial size systems, must be overcome to create good oracles. This paper presents an original method to construct clone oracles based on the Levenshtein metric. Although other oracles exist, this is the largest known oracle for type-3 clones that was created by an automated process on massive data sets. The method behind the creation of the oracle as well as actual oracles characteristics are presented. Discussion of the results in relation to other ways of building oracles is also provided along with future research possibilities.
    Proceeding of the 5th ICSE International Workshop on Software Clones, IWSC 2011, Waikiki, Honolulu, HI, USA, May 23, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clone detection and re-factoring have grown in importance over the past 10 years. In this talk, we will brief y review the WCRE 2000 work and discuss the advances in the field. The WCRE 2000 paper presented a computer assisted clone re-factoring approach. The process was based on metric-based clone analysis that produced clone clusters. Clones in the same cluster were then compared using token-based dynamic programming (DP) matching. Token-based clone differences, which included insertions, deletions, and substitutions, were then projected on to the ASTs corresponding to clones. Re-factoring opportunities were evaluated using: (1) classification of differences involving superficial differences, signature changes, and type changes, (2) number of differences, and (3) size of candidate clones. Selected clones were automatically re-factored using “strategy” and “template” design patterns. Experimental evaluation was performed on JDK1.1.5 from Sun Microsystems. Significant work followed over the next years addressing problems that include matching algorithms, scalability, and integration of clone detection in software engineering activities such as maintenance, evolution, and re-factoring. Several interesting surveys can be found in the literature together with a list of problems, many of which remain open today. In particular, recent work on software clones included new approaches to clone detection based on prefix and suffix trees, approaches to detection involving source code analysis based on latent semantic analysis, and clone identification techniques using analysis of program dependence graphs. In other works, a canonical representation of clones was developed and used for matching and comparison; interesting discussions about harmfulness of clones have also been reported; and empirical studies and evaluations of clone detection approaches can be found in several research papers. Evolution aspects have been taken into consideration - - in terms of evolution of clones and their lifetime over several versions of a system and in terms of software evolution by computing various similarity measures between versions. Clone research has also touched upon several interesting applications: intellectual property issues such as license infringement and plagiarism of source code have been addressed using software similarity concepts; incremental approaches to clone detection have been investigated; clones and similarity between structured software artifacts such as trees and graphs has been introduced; detection of bugs caused by inconsistent modifications between clones in a systems and between fragments in several software releases has been investigated; domain specific clones have been studied; and approaches for clone visualization have been proposed. Finally, new specialized workshops and conferences on clones and on mining software repositories have been organized. There are many open problems that remain and possible areas for future work in CLAN (CLone Analysis) toolset including the definition of clones; addressing type III (similar) and simple type IV (semantic) clones; performance and scalability aspects; taxonomies of clones; clone classification and statistics including frequent patterns of similarity in large systems; inconsistent modifications of clones in one version of a system and inconsistent source code changes over several versions of a system leading to a taxonomy of identifiable bugs; clone matching by parallelizing and implementing it on a Graphical Processing Unit (GPU); intellectual property and plagiarism detection using spectral clone analysis; increase recall while maintaining precision; clone maximality issues under thresholds; and more.
    01/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Graphics Processing Unit (GPU) have been around for a while. Although they are primarily used for high-end D graphics processing, their use is now acknowledged for general massive parallel computing. This paper presents an original technique based on [10] to compute many instances of the longest common subsequence problem on a generic GPU architecture using classic DP-matching [7]. Application of this algorithm has been found useful to address the problem of filtering false positives produced by metrics-based clone detection methods. Experimental results of this application are presented along with a discussion of possibilities of using GPUs for other cloning related problems.
    Proceeding of the 4th ICSE International Workshop on Software Clones, IWSC 2010, Cape Town, South Africa, May 2010; 01/2010
  • Ettore Merlo, Thierry Lavoie
    [Show abstract] [Hide abstract]
    ABSTRACT: A clone classification scheme is presented based on the structure of the abstract syntax tree (AST) of a system and on the similarity measures between syntactic blocks of source code. Syntactic blocks in a system may represent classes, methods, statement blocks, and so on. An inclusion relation may exist between the source code lines of some of these blocks, depending of the syntactic structure of the source code. For example, a block corresponding to a method body may contain several possibly nested statement blocks.This paper introduces an algorithm to identify different types of clone relations between blocks that are either method bodies or statement blocks. Clone relation types between these blocks are interesting because they indicate properties of the structural relation of these clones and may give hints on re-factoring opportunities. The proposed structural type clone classification scheme has been investigated on two open source Java systems, Tomcat and Eclipse. Experimental results are presented. Execution time performance of clone classification has been measured and reported. Results and further proposed research are discussed.
    16th Working Conference on Reverse Engineering, WCRE 2009, 13-16 October 2009, Lille, France; 01/2009
  • Dominic Letarte, Ettore Merlo
    [Show abstract] [Hide abstract]
    ABSTRACT: Web based applications may suffer from role privilege violations duet vulnerabilities in the source code. This paper presents an original algorithm to extract simple Boolean role privilege models from an inter-procedural perspective of PHP source code.Extracted models can be verified against role privilege violations,using model checkers. The proposed extraction approach has been preliminarily evaluated on a small PHP open source system, phpBB, that implements a bulletin board. Role privilege properties have been verified on the extracted models.Simple Boolean security models can be extracted and verified in linear time using the presented algorithms, while general approaches for inter-procedural model checking show a higher computational complexity due to their generality. Results have been successfully compared with those previously obtained from the corresponding inter-procedural data-flow vulnerability analysis.Results and execution time performance of the proposed model extraction and of the validation processes are presented and discussed. Further research, possible extensions, and conclusions are reported.
    16th Working Conference on Reverse Engineering, WCRE 2009, 13-16 October 2009, Lille, France; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: Buffer overflows cause serious problems in various categories of software systems. In critical systems, such as health-care, nuclear or aerospace software applications, a buffer overflow may cause severe threats to humans or severe economic losses. If they occur in network or security applications, they can be exploited to gain administrator privileges, perform system attacks, access unauthorized data, or misuse the system. This paper proposes a combination of genetic algorithms, linear programming, evolutionary testing, and static and dynamic information to detect buffer overflows. The newly proposed test input generation process avoids the need for human intervention to define and tune genetic algorithm weights and therefore it becomes completely automated. The process that guides the genetic search towards the detection of buffer overflow relies on a fitness function that takes into account static and dynamic information. Reported results of our case studies, consisting of two sets of open-source programs show that the new process and fitness function outperform previously published approaches.
    Computers & Operations Research 10/2008; 35:3125-3143. · 1.91 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The sexual dimorphism of cardiovascular traits, as well as susceptibility to a variety of related diseases, has long been recognized, yet their sex-specific genomic determinants are largely unknown. We systematically assessed the sex-specific heritability and linkage of 539 hemodynamic, metabolic, anthropometric, and humoral traits in 120 French-Canadian families from the Saguenay-Lac-St-Jean region of Quebec, Canada. We performed multipoint linkage analysis using microsatellite markers followed by peak-wide linkage scan based on Affymetrix Human Mapping 50K Array Xba240 single nucleotide polymorphism genotypes in 3 settings, including the entire sample and then separately in men and women. Nearly one half of the traits were age and sex independent, one quarter were both age and sex dependent, and one eighth were exclusively age or sex dependent. Sex-specific phenotypes are most frequent in heart rate and blood pressure categories, whereas sex- and age-independent determinants are predominant among humoral and biochemical parameters. Twenty sex-specific loci passing multiple testing criteria were corroborated by 2-point single nucleotide polymorphism linkage. Several resting systolic blood pressure measurements showed significant genotype-by-sex interaction, eg, male-specific locus at chromosome 12 (male-female logarithm of odds difference: 4.16; interaction P=0.0002), which was undetectable in the entire population, even after adjustment for sex. Detailed interrogation of this locus revealed a 220-kb block overlapping parts of TAO-kinase 3 and SUDS3 genes. In summary, a large number of complex cardiovascular traits display significant sexual dimorphism, for which we have demonstrated genomic determinants at the haplotype level. Many of these would have been missed in a traditional, sex-adjusted setting.
    Hypertension 05/2008; 51(4):1156-62. · 6.87 Impact Factor

Publication Stats

3k Citations
45.52 Total Impact Points

Institutions

  • 2008–2012
    • Centre hospitalier de l'Université de Montréal (CHUM)
      Montréal, Quebec, Canada
  • 1994–2012
    • Polytechnique Montréal
      • • Department of Computer Science and Software Engineering
      • • Département de génie électrique
      Montréal, Quebec, Canada
  • 2007
    • Intes GmbH Stuttgart
      Stuttgart, Baden-Württemberg, Germany
  • 1999–2005
    • Università degli Studi del Sannio
      • Department of Engineering (DING)
      Benevento, Campania, Italy
  • 1988–1995
    • McGill University
      • School of Computer Science
      Montréal, Quebec, Canada
  • 1992–1993
    • Centre de recherche du diabète de Montréal
      Montréal, Quebec, Canada
  • 1986
    • Concordia University Montreal
      Montréal, Quebec, Canada
  • 196
    • Museo delle Scienze, Trento, Italy
      Trient, Trentino-Alto Adige, Italy
    • École Polytechnique
      Paliseau, Île-de-France, France