
Jean-François Couchot- Professor
- Full Professor at University Bourgogne Franche-Comté
Jean-François Couchot
- Professor
- Full Professor at University Bourgogne Franche-Comté
About
144
Publications
28,165
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,090
Citations
Introduction
Current research interests :
- discrete systems (PseudoRandom Number Generators)
- data DE-identification, data privacy
- Watermarking
Current institution
Additional affiliations
September 2002 - present
Publications
Publications (144)
PhD defense presentation titled "AI-Enhanced Emergency Call Handling:
Development of Multimodal Machine Learning Models". This research focuses on developing multimodal machine learning models to improve the prioritization of emergency calls. The work also addresses privacy concerns related to the deployment of the models.
The cellular network is now nearly an almost ubiquitous and real-time sensor with coverage anywhere and anytime for any device. Mobile network data is a rich source for official statistics, such as human mobility. However, unlike GPS tracks, each mobile device in this data is described without precise knowledge of its spatial characteristics. Furth...
The Internet of Things (IoT) is an innovative technology that is revolutionizing the global economy and acquired significant recognition across various sectors, notably within the healthcare field. IoT cameras play a crucial role in facilitating real-time monitoring of the human body through video streams. However, this kind of IoT equipment is vul...
Emergency call centers are often required to properly assess and prioritise emergency situations pre-intervention, in order to provide the required assistance to the callers efficiently. In this paper, we present an end-to-end pipeline for emergency calls analysis. Such a tool can be found useful as it is possible for the intervention team to misin...
Automatically associating ICD codes with electronic health data is a well-known NLP task in medical research. NLP has evolved significantly in recent years with the emergence of pre-trained language models based on Transformers architecture, mainly in the English language. This paper adapts these models to automatically associate the ICD codes. Sev...
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on im...
Unstructured textual data is at the heart of healthcare systems. For obvious privacy reasons, these documents are not accessible to researchers as long as they contain personally identifiable information. One way to share this data while respecting the legislative framework (notably GDPR or HIPAA) is, within the medical structures, to de-identify i...
The amount of data coming from different sources such as IoT-sensors, social networks, cellular networks, has increased exponentially during the last few years. Probabilistic Data Structures (PDS) are efficient alternatives to deterministic data structures suitable for large data processing and streaming applications. They are mainly used for appro...
This paper introduces the multi-freq-ldpy Python package for multiple frequency estimation under Local Differential Privacy (LDP) guarantees. LDP is a gold standard for achieving local privacy with several real-world implementations by big tech companies such as Google, Apple, and Microsoft. The primary application of LDP is frequency (or histogram...
Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. The details included in these documents make it possible to get to know the patient better, to better manage him or her, to better study the pathologies, to accurately remunera...
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on im...
This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under Local Differential Privacy (LDP) guarantees. Contrary to frequency estimation of a single attribute, the multidimensional aspect demands particular attention to the privacy budge...
This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered grad...
This paper introduces the multi-freq-ldpy Python package for multiple frequency estimation under Local Differential Privacy (LDP) guarantees. LDP is a gold standard for achieving local privacy with several real-world implementations by big tech companies such as Google, Apple, and Microsoft. The primary application of LDP is frequency (or histogram...
This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered \tex...
Nowadays, there is no tool that provides a global, permanent and “real time” view of road freight transport flows. However, this type of mapping is already available for air and sea traffic and could be useful to transport companies, e.g., setting up logistics hubs in strategic locations, and to public authorities, e.g., quickly knowing the impact...
This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under local differential privacy (LDP). Contrary to frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attent...
This paper proposes a novel methodology based on machine learning (ML) techniques to predict both the victims' mortality and their need for transportation to health facilities using data gathered from the start of the emergency call until the Departmental Fire and Rescue Service of the Doubs (SDIS25) is notified. We first analyzed SDIS25 calls to f...
With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with r...
With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with r...
Emergency medical services (EMS) provide crucial emergency assistance and ambulatory services. One key measurement of EMS’s quality of service is their ambulances’ response time (ART), which generally refers to the period between EMS notification and the moment an ambulance arrives on the scene. Due to many victims requiring care within adequate ti...
Après une introduction courte, trois points sont principalement abordés.
Données de santé : dé-identifier les documents ?
Techniques pour dé-identifier les documents textuels ? Robustesse ?
Retour d’expériences ?
Longitudinal studies of human mobility could allow an understanding of human behavior on a vast scale. Mobile phone data call detail records (CDRs) have emerged as a prospective data source for such an important task. Nevertheless, there are significant risks when it comes to collecting this type of data, as human mobility has proven to be quite un...
Digital image watermarking has justified its suitability for copyright protection and copy control of digital images. In the past years, various watermarking schemes were proposed to enhance the fidelity and the robustness of watermarked images against different types of attacks such as additive noise, filtering, and geometric attacks. It is highly...
Technological development has its pros and cons. Nowadays, we can easily share, download, and upload digital content using the Internet. Also, malicious users can illegally change, duplicate, and distribute any kind of information, such as images and documents. Therefore, we should protect such contents and arrest the perpetrator. The goal of this...
Les travaux présentés ici visent à prédire le nombre d'interventions des pompiers par Communauté d'Agglomération tout en respectant la vie privée des utilisateurs. Une approche basée sur la confidentialité différentielle locale a été développée pour anonymiser les données de localisation, puis une approche d'apprentissage supervisé a été mise en pl...
Les travaux présentés ici visent à prédire le nombre d'interventions des pompiers par Communauté d'Agglomération tout en respectant la vie privée des utilisateurs. Une approche basée sur la confidentialité différentielle locale a été développée pour anonymiser les données de localisation, puis une approche d'apprentissage supervisé a été mise en pl...
Modeling and understanding people's mobility at a temporal and geographical space are very strict requirements for developing better strategies of urban public and private transportation systems as well as establishing improved business techniques. This work proposes a random-search based approach to instantiate statistical indicators through an im...
Modeling and understanding people's mobility at a temporal and geographical space are very strict requirements for developing better strategies of urban public and private transportation systems as well as establishing improved business techniques. This work proposes a random-search based approach to instantiate statistical indicators through an im...
Modeling and understanding people's mobility at a temporal and geographical space are very strict requirements for developing better strategies of urban public and private transportation systems as well as establishing improved business techniques. This work proposes a random-search based approach to instantiate statistical indicators through an im...
Statistical studies on the number and types of firefighter interventions by region are essential to improve service to the population. It is also a preliminary step if we want to predict these interventions in order to optimize the placement of human and material resources of fire departments, for example. However, this type of data is sensitive an...
Data publishing is a challenging task for privacy preservation constraints. To ensure privacy, many anonymization techniques have been proposed. They differ in terms of the mathematical properties they verify and in terms of the functional objectives expected. Disassociation is one of the techniques that aim at anonymizing of set-valued datasets (e...
Spread Transform Dither Modulation (STDM), a special case of Quantization Index Modulation (QIM), has been widely used in digital watermarking. STDM has good performance in robustness against re-quantization and random noise attacks, but it is largely vulnerable to the Fixed Gain Attack (FGA). In addition to digital images and videos watermarking a...
In this article, we propose a semi-automated method to rebuild genome ancestors of chloroplasts by taking into account gene duplication. Two methods have been used in order to achieve this work: a naked eye investigation using homemade scripts, whose results are considered as a basis of knowledge, and a dynamic programming based approach similar to...
Disassociation is a bucketization based anonymization technique that divides a set-valued dataset into several clusters to hide the link between individuals and their complete set of items. It increases the utility of the anonymized dataset, but on the other side, it raises many privacy concerns, one in particular, is when the items are tightly cou...
Spread Transform Dither Modulation (STDM), a special case of Quan-tization Index Modulation (QIM), has been widely used in digital watermarking. STDM has good performance in robustness against re-quantization and random noise attacks, but it is largely vulnerable to the Fixed Gain Attack (FGA). In addition to digital images and videos watermarking...
Pseudo-Random Number Generators (PRNG) are omnipresent in computer science: they are embedded in all approaches of numerical simulation (for exhaustiveness), optimization (to discover new solutions), testing (to detect bugs) cryptography (to generate keys), and deep learning (for initialization, to allow generalizations). . . . PRNGs can be basical...
Predicting the number and the type of operations by civil protection services is essential, both to optimize on-call firefighters in size and competence, to pre-position material and human resources... To accomplish this task, it is required to possess skills in artificial intelligence, which are not usually found in a medium-sized fire department....
Data publishing is a challenging task from the privacy point of view. Different anonymization techniques are proposed in the literature to preserve privacy in accordance with some mathematical constraints. Disassociation is one of the anonymization techniques that relies on the km - anonymity privacy constraint to guarantee a certain level of priva...
Disassociation introduced by Terrovitis et al. is a bucketization based anonimyzation technique that divides a set-valued dataset into several clusters to hide the link between individuals and their complete set of items. It increases the utility of the anonymized dataset, but on the other side, it raises many privacy concerns, one in particular, i...
Cette présentation montre comment la méthode d'anonymisation nommée dissociation (de Terrovitis) a été corrigée pour éviter une fuite d'information puis présente des travaux de prédictions sur des données de SP anonymisées.
Deux papiers en présentent les détails :
- AWAD, Nancy, COUCHOT, Jean-François, BOUNA, Bechara Al, et al. Ant-driven clusteri...
Un générateur de nombres pseudo-aléatoires est un algorithme capable de construire des séquences de nombres
semblables à celles qui seraient obtenues en notant par exemple les numéros de lancers d'un dé non pipé.
L'histoire de génération de nombres aléatoires est longue, riche et variée (cf Article de Pr. P. L'ECUYER, History of Uniform Random Numb...
Presentation made at the OCS symposium (https://www.ocsbesancon.fr/).
This dissertation mainly focused on the network lifetime maximization problem while ensuring a distributed control and respecting the desired video quality at the end user. To do so, two main axes have been considered, namely, the data processing axis and data routing axis. The first axis tackles the problem of finding a trade-off between the encod...
Background
To reconstruct the evolution history of DNA sequences, novel models of increasing complexity regarding the number of free parameters taken into account in the sequence evolution, as well as faster and more accurate algorithms, and statistical and computational methods, are needed. More particularly, as the principal forces that have led...
Spread Transform Dither Modulation (STDM) is a blind watermarking scheme used for its high robustness against re-quantization and random noise attacks. It has been applied mainly on images, speech, and PDF documents. The key of this scheme is the projection vector aiming at spreading the embedded message over a set of cover elements.
Background:
A huge and continuous increase in the number of completely sequenced chloroplast genomes, available for evolutionary and functional studies in plants, has been observed during the past years. Consequently, it appears possible to build large-scale phylogenetic trees of plant species. However, building such a tree that is well-supported...
In previous works, some of the authors have proposed a canonical form of Gray Codes (GCs) in N-cubes (hypercubes of dimension N). This form allowed them to draw an algorithm that theoretically provides exactly all the GCs for a given dimension N. In another work, we first have shown that any of these GC can be used to build the transition function...
Predicting how a genetic change affects a given character is a major challenge in biology, and being able to tackle this problem relies on our ability to develop realistic models of gene networks. However, such models are rarely tractable mathematically. In this paper, we propose a mathematical analysis of the sigmoid variant of the Wagner gene-net...
Wireless Multimedia Sensor Networks (WMSN) are today considered as a promising technology, notably because of the availability of miniaturized multimedia hardware (e.g., CMOS cameras). Nevertheless, they do raise new research challenges; i.e., multimedia content is much more voluminous and rich in comparison to scalar one. Hence, multimedia data ne...
Hardware security for an Internet of Things (IoT) or cyber physical system drives the
need for ubiquitous cryptography to different sensing infrastructures in these fields.
In particular, generating strong cryptographic keys on such resource-constrained device depends on a lightweight and cryptographically secure random number generator.
In this r...
Random number generation refers to many applications such as simulation, numerical analysis, cryptography etc. Field Programmable Gate Array (FPGA) are reconfigurable hardware systems, which allow rapid prototyping. This research work is the first comprehensive survey on how random number generators are implemented on Field Programmable Gate Arrays...
Predicting how a genetic change affects a given character is a major challenge in biology, and being able to tackle this problem relies on our ability to develop realistic models of gene networks. However, such models are rarely tractable mathematically. In this paper, we propose a mathematical analysis of the sigmoid variant of the Wagner gene-net...
Hardware pseudorandom number generators are continuously improved to satisfy both physical and ubiquitous computing security system challenges. The main contribution of this paper is to propose two post-processing modules in hardware, to improve the randomness of linear PRNGs while succeeding in passing the TestU01 statistical battery of tests. The...
Today, wireless cameras are widely deployed in various applications such as in commercial and military domains (eg. patients monitored in smart hospitals, multimedia surveillance deployed in smart cities, intelligent multimedia monitoring of an ecological system, etc.). The design of such systems can be achieved by allowing the interaction between...
Steganography, the art to hide information inside host media like pictures and movies, and steganalysis, its countermeasure attempting to detect the presence of an hidden information within an innocent-looking document, are frequently reported as promising information security techniques for telemedicine. For the past few years, in the race between...
In this paper, a novel steganographic scheme based on chaotic iterations is proposed. This research work takes place into the information hiding framework, and focus more specifically on robust steganography. Steganographic algorithms can participate in the development of a semantic web: medias being on the Internet can be enriched by information r...
This article presents a new class of Pseudorandom Number Generators. The generators are based on traversing a n-cube where a Balanced Hamiltonian Cycle has been removed. The construction of such generators is automatic for small number of bits, but remains an open problem when this number becomes large. A running example is used throughout the pape...
Sub-categories of mathematical topology, like the mathematical theory of chaos, offer interesting applications devoted to information security. In this research work, we have introduced a new chaos-based pseudorandom number generator implemented in FPGA, which is mainly based on the deletion of a Hamilton cycle within the n-cube (or on the vectoria...
The number of complete chloroplastic genomes increases day after day, making it possible to rethink plants phylogeny at the biomolecular era. Given a set of close plants sharing in the order of one hundred of core chloroplastic genes, this article focuses on how to extract the largest subset of sequences in order to obtain the most supported specie...
In previous works, the idea of walking into a \(\mathsf {N}\)-cube where a balanced Hamiltonian cycle have been removed has been proposed as the basis of a chaotic PRNG whose chaotic behavior has been proven. However, the construction and selection of the most suited balanced Hamiltonian cycles implies practical and theoretical issues. We propose i...
Conventional state-of-the-art image steganalysis approaches usually consist of a classifier trained with features provided by rich image models. As both features extraction and classification steps are perfectly embodied in the deep learning architecture called Convolutional Neural Network (CNN), different studies have tried to design a CNN-based s...
Technical signs of progress during the last decades has led to a situation in which the accumulation of genome sequence data is increasingly fast and cheap. The huge amount of molecular data available nowadays can help addressing new and essential questions in Evolution. However, reconstructing evolution of DNA sequences requires models, algorithms...
Technical signs of progress during the last decades has led to a situation in which the accumulation of genome sequence data is increasingly fast and cheap. The huge amount of molecular data available nowadays can help addressing new and essential questions in Evolution. However, reconstructing evolution of DNA sequences requires models, algorithms...
quelques points clefs en stéganographie et stéganalyse
Designing a pseudorandom number generator (PRNG) is a difficult and complex task. Many recent works have considered chaotic functions as the basis of built PRNGs: the quality of the output would indeed be an obvious consequence of some chaos properties. However, there is no direct reasoning that goes from chaotic functions to uniform distribution o...
ModèlesMod`Modèles discrets pour la s ´ ecuritéecurit´ecurité informatique: des m ´ ethodes itérativesàit´itérativesitératives`itérativesà l'analyse vectorielle JEAN-FRANÇ OIS COUCHOT
In this presentation we present a criterion which allows to choose, for a given input image, the appropriate steganalyzer between the CNN-based one proposed by Xu et al. and SRM+EC (maxSRNd2+Fisher linear discriminants). The experiments done considering the following spatial steganographic algorithms: S-UNIWARD, MiPOD, and HILL, with embedding payl...
In this paper, a blind digital watermarking scheme for Portable Document Format (PDF) documents is proposed. The proposed method is based on a variant Quantization Index Modulation (QIM) method called Spread Transform Dither Modulation (STDM). Each bit of the secret message is embedded into a group of characters, more specifically in their x-coordi...
Designing a pseudorandom number generator (PRNG) is a difficult and complex task. Many recent works have considered chaotic functions as the basis of built PRNGs: the quality of the output would indeed be an obvious consequence of some chaos properties. However, there is no direct reasoning that goes from chaotic functions to uniform distribution o...
Conventional state-of-the-art image steganalysis approaches usually consist of a classifier trained with features provided by rich image models. As both features extraction and classification steps are perfectly embodied in the deep learning architecture called Convolutional Neural Network (CNN), different studies have tried to design a CNN-based s...
Steganography schemes are designed with the objective of minimizing a defined distortion function. In most existing state of the art approaches, this distortion function is based on image feature preservation. Since smooth regions or clean edges define image core, even a small modification in these areas largely modifies image features and is thus...
Pseudorandom number generation (PRNG) is a key element in hardware security platforms like field-programmable gate array FPGA circuits. In this article, 18 PRNGs belonging in 4 families (xorshift, LFSR, TGFSR, and LCG) are physically implemented in a FPGA and compared in terms of area, throughput, and statistical tests. Two flows of conception are...
In traditional sensor networks, data processing is mostly simple and even negligible in terms of energy consumption. On the opposite, in Wireless Multimedia Sensor Networks (WMSNs) the captured data is usually voluminous and requires local processing. The objective here is to deliver data at desired visual quality. Obviously, better the visual qual...
The aim of this study is to investigate the relation that can be found between the phylogeny of a large set of complete chloroplast genomes, and the evolution of gene content inside these sequences. Core and pan genomes have been computed on \textit{de novo} annotation of these 845 genomes, the former being used for producing well-supported phyloge...
The aim of this study is to investigate the relation that can be found between the phylogeny of a large set of complete chloroplast genomes, and the evolution of gene content inside these sequences. Core and pan genomes have been computed on \textit{de novo} annotation of these 845 genomes, the former being used for producing well-supported phyloge...
The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported...
The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large-scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported...
Many research works deal with chaotic neural networks for various fields of application. Unfortunately, up to now these networks are usually claimed to be chaotic without any mathematical proof. The purpose of this paper is to establish, based on a rigorous theoretical framework, an equivalence between chaotic iterations according to Devaney and a...
Steganography and steganalysis are two important branches of the information hiding field of research. Steganography methods consist in hiding information in such a way that the secret message is undetectable for the uninitiated. Steganalyzis encompasses all the techniques that attempt to detect the presence of such hidden information. This latter...
For the past few years, in the race between image steganography and steganalysis, deep learning has emerged as a very promising alternative to steganalyzer approaches based on rich image models combined with ensemble classifiers. A key knowledge of image steganalyzer, which combines relevant image features and innovative classification procedures,...
Pseudorandom number generation (PRNG) is a key element in hardware security platforms like fieldprogrammable gate array FPGA circuits. In this article, 18 PRNGs belonging in 4 families (xorshift, LFSR, TGFSR, and LCG) are physically implemented in a FPGA and compared in terms of area, throughput, and statistical tests. Two flows of conception are u...
For the past few years, in the race between image steganography and steganalysis, deep learning has emerged as a very promising alternative to steganalyzer approaches based on rich image models combined with ensemble classifiers. A key knowledge of image steganalyzer, which combines relevant image features and innovative classification procedures,...
The performance of some state-of-the-art steganalysers is investigated according to various parameters, encompassing the choice of the steganographier, its payload, and the type of images both during training and testing stage. All these parameters are changed to determine their effects on steganalysis performance. Experiments are performed using l...