• Home
  • Jérôme Darmont
Jérôme Darmont

Jérôme Darmont
Université de Lyon, Lyon 2, France · UR ERIC

Eng, PhD, HDR

About

246
Publications
55,091
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,665
Citations
Additional affiliations
September 2008 - present
Université Lumiere Lyon 2
Position
  • Professor (Full)
Description
  • Teacher and researcher in data management
September 1999 - August 2008
Université Lumiere Lyon 2
Position
  • Professor (Associate)
Description
  • Teacher and researcher in data management
September 1996 - July 1999
Université Clermont Auvergne
Position
  • PhD Student
Education
November 2006 - November 2006
September 1996 - January 1999
September 1993 - September 1994

Publications

Publications (246)
Preprint
Full-text available
With the increased need for data confidentiality in various applications of our daily life, homomorphic encryption (HE) has emerged as a promising cryptographic topic. HE enables to perform computations directly on encrypted data (ciphertexts) without decryption in advance. Since the results of calculations remain encrypted and can only be decrypte...
Chapter
Data Warehouses (DWs) are core components of Business Intelligence (BI). Missing data in DWs have a great impact on data analyses. Therefore, missing data need to be completed. Unlike other existing data imputation methods mainly adapted for facts, we propose a new imputation method for dimensions. This method contains two steps: 1) a hierarchical...
Article
Full-text available
The proliferation of rumors on social media has become a major concern due to its ability to create a devastating impact. Manually assessing the veracity of social media messages is a very time-consuming task that can be much helped by machine learning. Most message veracity verification methods only exploit textual contents and metadata. Very few...
Article
Diversity and Inclusion (D&I) are core to fostering innovative thinking. Existing theories demonstrate that to facilitate inclusion, multiple types of exclusionary dynamics, such as self-segregation, communication apprehension, and stereotyping and stigmatizing, must be overcome [11]. A diverse group of people tends to surface different perspective...
Preprint
Reputed by their low-cost, easy-access, real-time and valuable information, social media also wildly spread unverified or fake news. Rumors can notably cause severe damage on individuals and the society. Therefore, rumor detection on social media has recently attracted tremendous attention. Most rumor detection approaches focus on rumor feature ana...
Preprint
Missing values occur commonly in the multidimensional data warehouses. They may generate problems of usefulness of data since the analysis performed on a multidimensional data warehouse is through different dimensions with hierarchies where we can roll up or drill down to the different parameters of analysis. Therefore, it's essential to complete t...
Preprint
In the last few years, the concept of data lake has become trendy for data storage and analysis. Thus, several design alternatives have been proposed to build data lake systems. However, these proposals are difficult to evaluate as there are no commonly shared criteria for comparing data lake systems. Thus, we introduce DLBench, a benchmark to eval...
Chapter
Reputed by their low-cost, easy-access, real-time and valuable information, social media also wildly spread unverified or fake news. Rumors can notably cause severe damage on individuals and the society. Therefore, rumor detection on social media has recently attracted tremendous attention. Most rumor detection approaches focus on rumor feature ana...
Preprint
Users of social networks tend to post and share content with little restraint. Hence, rumors and fake news can quickly spread on a huge scale. This may pose a threat to the credibility of social media and can cause serious consequences in real life. Therefore, the task of rumor detection and verification has become extremely important. Assessing th...
Chapter
In the last few years, the concept of data lake has become trendy for data storage and analysis. Thus, several approaches have been proposed to build data lake systems. However, these proposals are difficult to evaluate as there are no commonly shared criteria for comparing data lake systems. Thus, we introduce DLBench, a benchmark to evaluate and...
Preprint
In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and academia, the concept of data lake is still maturing, and there are still few methodological approaches to data l...
Chapter
Missing data occur commonly in data warehouses and may generate data usefulness problems. Thus, it is essential to address missing data to carry out a better analysis. There exists data imputation methods for missing data in fact tables, but not for dimension tables. Hence, we propose in this paper a data imputation method for data warehouse dimens...
Chapter
Users of social networks tend to post and share content with little restraint. Hence, rumors and fake news can quickly spread on a huge scale. This may pose a threat to the credibility of social media and can cause serious consequences in real life. Therefore, the task of rumor detection and verification has become extremely important. Assessing th...
Chapter
In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and academia, the concept of data lake is still maturing, and there are still few methodological approaches to data l...
Preprint
Full-text available
Extracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, c...
Preprint
Using data warehouses to analyse multidimensional data is a significant task in company decision-making.The data warehouse merging process is composed of two steps: matching multidimensional components and then merging them. Current approaches do not take all the particularities of multidimensional data warehouses into account, e.g., only merging s...
Preprint
Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge...
Preprint
With new emerging technologies, such as satellites and drones, archaeologists collect data over large areas. However, it becomes difficult to process such data in time. Archaeological data also have many different formats (images, texts, sensor data) and can be structured, semi-structured and unstructured. Such variety makes data difficult to colle...
Conference Paper
Using data warehouses to analyse multidimensional data is a significant task in company decision-making. The need for analyzing data stored in different data warehouses generates the requirement of merging them into one integrated data warehouse. The data warehouse merging process is composed of two steps: matching multidimensional components and t...
Preprint
We summarize here a paper published in 2021 in the DOLAP international workshop DOLAP associated with the EDBT and ICDT conferences. We propose goldMEDAL, a generic metadata model for data lakes based on four concepts and a three-level modeling: conceptual, logical and physical.
Preprint
The rise of big data has revolutionized data exploitation practices and led to the emergence of new concepts. Among them, data lakes have emerged as large heterogeneous data repositories that can be analyzed by various methods. An efficient data lake requires a metadata system that addresses the many problems arising when dealing with big data. In...
Preprint
Full-text available
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible...
Article
Full-text available
Extracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, c...
Article
Full-text available
Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge...
Article
Full-text available
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible...
Preprint
Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems....
Preprint
Full-text available
This paper presents the way the joint ADBIS, TPDL and EDA 2020 conferences were organized online and the results of the participant survey conducted thereafter. We present the lessons learned from the participants' feedback.
Preprint
Companies and individuals produce numerous tabular data. The objective of this position paper is to draw up the challenges posed by the automatic integration of data in the form of tables so that they can be cross-analyzed. We provide a first automatic solution for the integration of such tabular data to allow On-Line Analysis Processing. To fulfil...
Book
This book constitutes thoroughly reviewed and selected papers presented at Workshops and Doctoral Consortium of the 24th East-European Conference on Advances in Databases and Information Systems, ADBIS 2020, the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, and the 16th Workshop on Business Intelligence and B...
Preprint
The extensive use of social media in the diffusion of information has also laid a fertile ground for the spread of rumors, which could significantly affect the credibility of social media. An ever-increasing number of users post news including, in addition to text, multimedia data such as images and videos. Yet, such multimedia content is easily ed...
Book
This book constitutes thoroughly refereed short papers of the 24th European Conference on Advances in Databases and Information Systems, ADBIS 2020, held in August 2020. ADBIS 2020 was to be held in Lyon, France, however due to COVID-19 pandemic the conference was held in online format. The 18 presented short research papers were carefully reviewe...
Book
This book constitutes the proceedings of the 24th European Conference on Advances in Databases and Information Systems, ADBIS 2020, held in Lyon, France, in August 2020.* The 13 full papers presented were carefully reviewed and selected from 82 submissions. The papers cover a wide range of topics from different areas of research in database and inf...
Conference Paper
Au cours de la dernière décennie, le concept de lac de données a émergé comme une alternative aux entrepôts de données pour le stockage et l'analyse des mégadonnées. Le lac de données propose un stockage des données sans schéma prédéfini. En l'absence de schéma, l'interrogation et l'analyse des données dépendent alors d'un système de métadonnées qu...
Preprint
Full-text available
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depend on a metadata system that must be efficient and comprehensive. However, metadata management in data lakes remains...
Conference Paper
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depend on a metadata system that must be efficient and comprehensive. However, metadata management in data lakes remains...
Chapter
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depend on a metadata system that must be efficient and comprehensive. However, metadata management in data lakes remains...
Cover Page
Full-text available
Why data lakes did not kill data warehouses and how they even are complementary.
Preprint
Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is essential to avoid the data lake turning to a so-called data swamp. Existing works about managing data lake metadata m...
Book
Papers from the one-day workshop organized by Action ADOC of the GdR MaDICS whose topic was about the Variety (in the sense of big data) of the Humanities data. This workshop was associated to the INFORSID 2018 conference.
Conference Paper
Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is essential to avoid the data lake turning to a so-called data swamp. Existing works about managing data lake metadata m...
Chapter
Cloud computing helps reduce costs, increase business agility and deploy solutions with a high return on investment for many types of applications, including data warehouses and on-line analytical processing. However, storing and transferring sensitive data into the cloud raises legitimate security concerns. In this paper, the authors propose a new...
Book
This book constitutes the thoroughly refereed short papers, workshops and doctoral consortium papers of the 23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019, held in Bled, Slovenia, in September 2019. The 19 short research papers and the 5 doctoral consortium papers were carefully reviewed and selected from 103...
Book
Full-text available
Republished from the International Journal of Data Warehousing and Mining, Vol. 11, No. 2, April-June 2015, 21-42
Chapter
In data management, both system designers and users casually resort to performance evaluation. Performance evaluation by experimentation on a real system is generally referred to as benchmarking. The aim of this chapter is to present an overview of the major past and present state-of-the-art data-centric benchmarks. This review includes the TPC sta...
Chapter
The ADBIS conferences provide an international forum for the presentation of research on database theory, development of advanced DBMS technologies, and their applications. The 22nd edition of ADBIS, held on September 2–5, 2018, in Budapest, Hungary, includes six thematic workshops collecting contributions from various domains representing new tren...
Article
Full-text available
Top-k keyword and top-k document extraction are very popular text analysis techniques. Top-k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark...
Preprint
Clustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categ...
Conference Paper
Clustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categ...
Article
Les conséquences d'une intrusion dans un système d'information peuvent s'avéver problématiques pour l'existence d'une entreprise ou d'une organisation. Les impacts sont synonymes d'une perte financière, d'image de marque et de sérieux. La détection d'une intrusion n'est pas une finalité en soit, la ré-duction du delta détection-réaction est devenue...
Preprint
With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional met...
Chapter
Clustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categ...
Conference Paper
With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional met...
Preprint
Full-text available
Cluster analysis is widely used in the areas of machine learning and data mining. Fuzzy clustering is a particular method that considers that a data point can belong to more than one cluster. Fuzzy clustering helps obtain flexible clusters, as needed in such applications as text categorization. The performance of a clustering algorithm critically d...
Conference Paper
Cluster analysis is widely used in the areas of machine learning and data mining. Fuzzy clustering is a particular method that considers that a data point can belong to more than one cluster. Fuzzy clustering helps obtain flexible clusters, as needed in such applications as text categorization. The performance of a clustering algorithm critically d...
Chapter
Full-text available
Cluster analysis is widely used in the areas of machine learning and data mining. Fuzzy clustering is a particular method that considers that a data point can belong to more than one cluster. Fuzzy clustering helps obtain flexible clusters, as needed in such applications as text categorization. The performance of a clustering algorithm critically d...
Conference Paper
Full-text available
Les conséquences d'une intrusion dans un système d'information peuvent s'avéver problématiques pour l'existence d'une entreprise ou d'une organisation. Les impacts sont synonymes d'une perte financière, d'image de marque et de sérieux. La détection d'une intrusion n'est pas une finalité en soit, la réduction du delta détection-réaction est devenue...
Conference Paper
Full-text available
Avec l'avènement des mégadonnées (big data), l'informatique déci-sionnelle a dû trouver des solutions pour gérer des volumes et une variété de données plus grands encore que dans les entrepôts de données, qui se sont ré-vélés mal adaptés. Les lacs de données (data lakes) répondent à ces besoins du point du vue du stockage, mais nécessitent la gesti...
Article
Full-text available
Cloud computing helps reduce costs, increase business agility and deploy solutions with a high return on investment for many types of applications. However, data security is of premium importance to many users and often restrains their adoption of cloud technologies. Various approaches, i.e., data encryption, anonymization, replication and verifica...
Conference Paper
Full-text available
Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly,...
Conference Paper
Full-text available