Jérôme Darmont
Research skills
-
TechnicalSQL, PL/SQL, PHP, UNIX...
-
ITDatabases (design, Administration, Data warehousing (performance, Autoadmin, XML DWs and XML OLAP, Data quality and security, Cloud Business Intelligence, Health/medical applications
Research interests
-
InterestsSecurity, Data Warehousing, Performance, Data Security, Data Quality, XML, Database, Complex data, Autoadmin, Cloud BI, Database Management
Research experience
-
Teaching: Web design & programming
-
Teaching: Databases
-
Sep 2009–
Jul 2010Research: DPC-CLO
Université Lumiere Lyon 2 · Université Lyon 2ERICDetection of complex phenomena in oral linguistic corpora -
Sep 2007–
Jul 2008Research: TAPEO
Université Lumiere Lyon 2 · Université Lyon 2ERICStock exchange collective intelligence -
Sep 2004–
Jul 2007Research: FoDoMuSt
Université Lumiere Lyon 2 · Université Lyon 2ERICMultistrategy Data Mining -
Sep 2003–
Jul 2004Research: MAP
Université Lumiere Lyon 2 · Université Lyon 2ERICPersonalized, anticipative medicine -
Sep 2002–
Jul 2005Research: CLAPI
Université Lumiere Lyon 2 · Université Lyon 2ERICOnline Spoken Language Corpus
Education
-
Nov 2006–
Nov 2006Université Lumière Lyon 2
HDR (Computer Science)France -
Sep 1996–
Jan 1999Université Blaise Pascal - Clermont-Ferrand II
PhD (Computer Science)France -
Sep 1993–
Sep 1994Université Blaise Pascal - Clermont-Ferrand II
MSc (Computer Science)France -
Sep 1991–
Sep 1994ISI-CUST
Engineering Degree (Computer Science)France
Other
-
LanguagesFrench (native), English (Fluent), German (should practice), Russian (learning)
-
Scientific MembershipsACM SIGMOD, IEEE Computer Society, SPECIF, ExQI
-
Journal RefereeJIIS, JDS, DKE, TKDE, DAPD, IJBIDM
-
Other InterestsIEEE Transactions on Knowledge and Data Engineering, ACM SIGMOD Record, Journal of Database Management, VLDB Journal, International Journal on Data Warehousing and Mining, Jim Gray: The Benchmark Handbook for Database and Transaction Systems (2nd Edition). Morgan Kaufmann 1993
Ralph Kimball & Margy Ross: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (2nd Edition). Wiley 2002, Bulletin of Mathematics, Informatics, Physics
International Journal of Data Mining, Modelling and Management (IJDMMM)
International Journal of Biomedical Engineering and Technology (IJBET)
Steering committee, domestic conference EDA
Publications
-
Cost Models for View Materialization in the Cloud
Workshop on Data Analytics in the Cloud (EDBT-ICDT/DanaC 12); 03/2012
-
A Survey of XML Tree Patterns
IEEE Transactions on Knowledge and Data Engineering. 01/2012;
With XML becoming an ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphi... [more] With XML becoming an ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over data trees. They are actually matched against an input data tree to answer a query. Since the turn of the twenty-first century, an astounding research effort has been focusing on tree pattern models and matching optimization (a primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the various features of tree patterns. We also review and discuss the two main families of approaches for optimizing tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree pattern-based developments, to provide a global overview of this significant research topic.
-
An Active XML-based Framework for Integrating Complex Data
27th Annual ACM Symposium On Applied Computing (SAC 12), Riva del Garda (Trento), Italy; 01/2012
Data integration is a critical problem in data warehousing and decision-support systems. Traditional data integration systems are very successful in integrating structured data, but structured data represent only a small subset of interesting data that could be warehoused by many enterprises. Curren... [more] Data integration is a critical problem in data warehousing and decision-support systems. Traditional data integration systems are very successful in integrating structured data, but structured data represent only a small subset of interesting data that could be warehoused by many enterprises. Current data integration systems also lack of self-managing capabilities. Therefore, we propose a data integration framework for integrating complex data from heterogeneous and distributed data sources reactively. The purpose of our framework is twofold. Firstly, it integrates complex data using Web standards into an Active XML (AXML) repository. Secondly, it exploits active rules and framework events mining to self-manage, automate and reactivate different data integration tasks. Finally, we have implemented a prototype framework as a web application.
-
Cost Models for View Materialization in the Cloud
Workshop on Data Analytics in the Cloud (EDBT-ICDT/DanaC 12), Berlin, Germany; 01/2012
In classical databases, query performance is casually achieved through physical data structures such as caches, indexes and materialized views. In this context, many cost models help select a best set of such data structures. However, this selection task becomes more complex in the cloud. The criter... [more] In classical databases, query performance is casually achieved through physical data structures such as caches, indexes and materialized views. In this context, many cost models help select a best set of such data structures. However, this selection task becomes more complex in the cloud. The criterion to optimize is indeed at least two-dimensional, with the monetary cost of using the cloud balancing query response time. Thus, we define in this paper new cost models that fit into the pay-as-you-go paradigm of cloud computing. These cost models help achieve a multi-criteria optimization of the view materialization vs. CPU power augmentation problem, under budget constraints. Finally, we present experimental results that provide a first validation of our contribution and show that cloud view materialization is always desirable.
-
Confidentialité et disponibilité des données entreposées dans les nuages
9ème atelier Fouille de données complexes (FDC 12), Bordeaux, Bordeaux; 01/2012
-
Efficient Incremental Breadth-Depth XML Event Mining
10/2011;
Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approac... [more] Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules.
-
An Efficient Fuzzy Clustering-Based Approach for Intrusion Detection
10/2011;
The need to increase accuracy in detecting sophisticated cyber attacks poses a great challenge not only to the research community but also to corporations. So far, many approaches have been proposed to cope with this threat. Among them, data mining has brought on remarkable contributions to the intr... [more] The need to increase accuracy in detecting sophisticated cyber attacks poses a great challenge not only to the research community but also to corporations. So far, many approaches have been proposed to cope with this threat. Among them, data mining has brought on remarkable contributions to the intrusion detection problem. However, the generalization ability of data mining-based methods remains limited, and hence detecting sophisticated attacks remains a tough task. In this thread, we present a novel method based on both clustering and classification for developing an efficient intrusion detection system (IDS). The key idea is to take useful information exploited from fuzzy clustering into account for the process of building an IDS. To this aim, we first present cornerstones to construct additional cluster features for a training set. Then, we come up with an algorithm to generate an IDS based on such cluster features and the original input features. Finally, we experimentally prove that our method outperforms several well-known methods.
-
Pattern tree-based XOLAP rollup operator for XML complex hierarchies
02/2011;
With the rise of XML as a standard for representing business data, XML data warehousing appears as a suitable solution for decision-support applications. In this context, it is necessary to allow OLAP analyses on XML data cubes. Thus, XQuery extensions are needed. To define a formal framework and al... [more] With the rise of XML as a standard for representing business data, XML data warehousing appears as a suitable solution for decision-support applications. In this context, it is necessary to allow OLAP analyses on XML data cubes. Thus, XQuery extensions are needed. To define a formal framework and allow much-needed performance optimizations on analytical queries expressed in XQuery, defining an algebra is desirable. However, XML-OLAP (XOLAP) algebras from the literature still largely rely on the relational model. Hence, we propose in this paper a rollup operator based on a pattern tree in order to handle multidimensional XML data expressed within complex hierarchies.
-
XWeB: the XML Warehouse Benchmark
02/2011;
With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, imp... [more] With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems.
-
Business Intelligence for Small and Middle-Sized Entreprises
02/2011;
Data warehouses are the core of decision support sys- tems, which nowadays are used by all kind of enter- prises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt ex- isting solutions and approaches,... [more] Data warehouses are the core of decision support sys- tems, which nowadays are used by all kind of enter- prises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt ex- isting solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architec- tures and tools (hardware and software) providing on- line data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also re- view in-memory processing. Consequently, this paper discusses the existing approa- ches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making.
-
An Efficient Local Region and Clustering-Based Ensemble System for Intrusion Detection
15th International Database Engineering and Applications Symposium (IDEAS 11), Lisbon, Portugal; 01/2011
The dramatic proliferation of sophisticated cyber attacks, in conjunction with the ever growing use of Internet-based services and applications, is nowadays becoming a great concern in any organization. Among many efficient security solutions proposed in the literature to deal with this evolving thr... [more] The dramatic proliferation of sophisticated cyber attacks, in conjunction with the ever growing use of Internet-based services and applications, is nowadays becoming a great concern in any organization. Among many efficient security solutions proposed in the literature to deal with this evolving threat, ensemble approaches, a particular family of data mining, have proven very successful in designing high performance intrusion detection systems (IDSs) resting on the mutual combination of multiple classifiers. However, the strength of ensemble systems depends heavily on the methods to generate and combine individual classifiers (ensemble members). In this thread, we propose a novel design method to generate a robust ensemble-based IDS. In our approach, individual classifiers are built using both the input feature space and additional features exploited from k-means clustering. In addition, the ensemble combination is calculated based on the classification ability of individual classifiers on different local data regions defined in form of k-means clustering. Experimental results prove that our solution is superior to several state-of-the-art methods.
-
Attribute Weighting with Adaptive NBTree for Reducing False Positives in Intrusion Detection
International Journal of Computer Science and Information Security. 01/2010;
In this paper, we introduce new learning algorithms for reducing false positives in intrusion detection. It is based on decision tree-based attribute weighting with adaptive naïve Bayesian tree, which not only reduce the false positives (FP) at acceptable level, but also scale up the detection rates... [more] In this paper, we introduce new learning algorithms for reducing false positives in intrusion detection. It is based on decision tree-based attribute weighting with adaptive naïve Bayesian tree, which not only reduce the false positives (FP) at acceptable level, but also scale up the detection rates (DR) for different types of network intrusions. Due to the tremendous growth of network-based services, intrusion detection has emerged as an important technique for network security. Recently data mining algorithms are applied on network-based traffic data and host-based program behaviors to detect intrusions or misuse patterns, but there exist some issues in current intrusion detection algorithms such as unbalanced detection rates, large numbers of false positives, and redundant attributes that will lead to the complexity of detection model and degradation of detection accuracy. The purpose of this study is to identify important input attributes for building an intrusion detection system (IDS) that is computationally efficient and effective. Experimental results performed using the KDD99 benchmark network intrusion detection dataset indicate that the proposed approach can significantly reduce the number and percentage of false positives and scale up the balance detection rates for different types of network intrusions.
-
Business Intelligence for Small and Middle-Sized Enterprises
SIGMOD Record. 01/2010; 39(2):39-50.
Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which... [more] Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architectures and tools (hardware and software) providing online data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is heavy and storage-costly; therefore, we also review in-memory processing. Consequently, this paper discusses the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), particularly aimed at aiding small and middle-sized enterprises in decision making.
-
Aggregation of data quality metrics using the Choquet integral
8th International Workshop on Quality in Databases (VLDB/QDB 10), Singapore; 01/2010
In the context of multi-source databases, data fusion is a tricky task, and resolving inconsistency problems when merging duplicate information is one of the most intricate issues as it is generally resolved through subjective approaches. Using data quality dimensions may help sort out such a questi... [more] In the context of multi-source databases, data fusion is a tricky task, and resolving inconsistency problems when merging duplicate information is one of the most intricate issues as it is generally resolved through subjective approaches. Using data quality dimensions may help sort out such a question impartially. Quality metrics are the objective criteria that justify the preference of a value v1 over a value v2; where v1 and v2 are both referring to the same real world entity but issue from different sources. However, this technique is fairly complicated when the v1 criteria are not all better than the v2 ones; when we have to choose, for instance, between a highly fresh but inconsistent data, and a consistent old one. Hence, we need a global qualifying score to facilitate the comparison. In this perspective, aggregation of data quality metrics can be the solution for computing a global and objective data quality score. In this paper, we introduce a solution that uses the Choquet integral as a means of aggregating data quality metrics.
-
Enhancing XML Data Warehouse Query Performance by Fragmentation
08/2009;
XML data warehouses form an interesting basis for decision-support applications that exploit heterogeneous data from multiple sources. However, XML-native database systems currently suffer from limited performances in terms of manageable data volume and response time for complex analytical queries. ... [more] XML data warehouses form an interesting basis for decision-support applications that exploit heterogeneous data from multiple sources. However, XML-native database systems currently suffer from limited performances in terms of manageable data volume and response time for complex analytical queries. Fragmenting and distributing XML data warehouses (e.g., on data grids) allow to address both these issues. In this paper, we work on XML warehouse fragmentation. In relational data warehouses, several studies recommend the use of derived horizontal fragmentation. Hence, we propose to adapt it to the XML context. We particularly focus on the initial horizontal fragmentation of dimensions' XML documents and exploit two alternative algorithms. We experimentally validate our proposal and compare these alternatives with respect to a unified XML warehouse model we advocate for.
Following (51)
-
Maurice HT Ling
South Dakota State University -
Josef Mayer
PHORON and IQSOFT -
Margaret Herzog
Colorado State University -
Mouhib Alnoukari
Syrian International University for Sciences and Technology -
Mathilde Forestier
Université Lumiere Lyon 2