Christian DerquenneÉlectricité de France (EDF) | EDF · R&D
Christian Derquenne
PhD
About
56
Publications
7,685
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
182
Citations
Publications
Publications (56)
Cet article a pour objectif de montrer en quoi la Science Statistique peut répondre aux enjeux issus du domaine du Sport. Nous illustrons cette dimension dans le domaine para-sportif et plus particulièrement axée vers les Jeux Paralympiques de 2024à Paris. Les travaux présentés entrent dans le cadre du projet Paraperf de l'INSEP et du Mécénat de co...
Cet article se place dans le cadre de la multicolinéarité entre les prédicteurs au sein des modèles linéaires généralisés. Le phénomène de multicolinéarité peut entrainer des incohérences sur les coefficients de régression et des oublis de prédicteurs, cela peut par conséquent poser des problèmes d'interprétation qui peuvent entrainer de mauvaises...
This communication is part of the Paraperf project: "Optimization of Paralympic performance: from identification to obtaining the medal" coordinated by INSEP. The objective of this project is to provide educational tools that meet the needs of Federations and for coaches and para-athletes. The goal is to optimize the sporting journey of each French...
This article is placed within the framework of the multicollinearity between the predictors within Generalized linear models The phenomenon of multicollinearity can lead to inconsistencies in the regression coecients and to omissions of predictors this can therefore pose problems of interpretation which can lead to poor decisions. The proposed crit...
Cet article a pour objectif de montrer en quoi la Science Statistique peut répondre aux enjeux issus du domaine du Sport. Nous illustrons cette dimension dans le domaine para-sportif et plus particulièrement axée vers les Jeux Paralympiques de 2024 à Paris. Les travaux présentés entrent dans le cadre du projet Paraperf de l'INSEP et du Mécénat de c...
Cet article se place dans le cadre de la multicolinéarité entre les prédicteurs au sein d'un modèle linéaire de régression multiple. Le phénomène de multicolinéarité peut entrainer des incohérences sur les coefficients de régression et des oublis de prédicteurs, cela peut par conséquent poser des problèmes d'interprétation qui peuvent entrainer de...
L'analyse du comportement des observations dans les séries temporelles est primordiale pour la prévision, la simulation, le filtrage. En effet, la présence de ruptures, d'anomalies dans une ou plusieurs séries en entrée pour prévoir une série temporelle de sor-tie peut entâcher la qualité de prévision. Ce problème se rencontre pour des séries tempo...
La recherche de structures dans les données représente une aide essentielle pour com-prendre les phénomènesà analyser. Les méthodes de classification croisée permettent de répondrè a cette problématique lorsque l'on désire traiter conjointement les données sur les deux dimensions : lignes/colonnes (individus/variables). Dans certaines applications,...
La recherche de structures dans les données représente une aide essentielle pour comprendre les phénomènesphénomènesà analyser. Les méthodes de classification de variables permettent de répondrè a cette problématique, mais elles peuventêtrepeuventêtre pénalisées par un trop grand nombre de variables. Nous proposons une nouvelle approche de type "Di...
The search for structures in the data represents an essential help to understand the phenomena to be analyzed. The methods of clustering of numerical variables make it possible to answer this problem, but they can be penalized by too many variables. We propose a new "Divide and Conquer" approach based on the MapReduce principle to overcome this pro...
The research of structures in the data represents an essential aid to understanding the phenomena to be analyzed. We have offered in 2016 and 2017, a set of methods for clustering numeric variables with linear or non-linear links. In case of high dimensional data (a lot of variables and a lot of individuals), we propose to adapt these methods by mea...
The research structures in the data this essential aid to understand the phenomena to be analyzed before any further treatment. Unsupervised learning and visualization techniques are the main tools to facilitate these research facilities. We have proposed a set of methods for clustering numeric variables in 2016. These are based on a mixed approach...
Abstract. The research structures in the data has an essential aid to understanding the phenomena to be analyzed. The methods for clustering numeric variables answer to this problem, but the majority has been developed only for linear relationships between variables. We propose a new approach of clustering of variables with non linear relationships...
The research structures in the data has an essential aid to understanding the phenomena to be analyzed. Unsupervised learning accompanied by visualization techniques are the main tools. We offer a set of methods for clustering numeric variables. These are based on a mixed approach: correlation between the initial variables and one-dimensionality of...
Does a response series is systematically explained by the same inputs along the entire length of the series? This question arises particularly in the context of non-stationary irregular series, such as finance (market prices, CAC40, ...), to understand how to shape the response. The proposed approach can not only answer to this problem, but also pr...
The objectives of time series analysis are many: forecast their future behavior , understand how the response is built using predictors, synthesize information from several time series, detect breaks behavior or search similar or abnormal time intervals. This paper is placed in the context of the latter objective. For this, we use a method of segme...
We propose a method to build a complex model for irregular and multivariate time series. First, to take into account non-linearity, break, volatility, we chose to segment these series as linear trends to eliminate non-stationarity, standardizing the raw series with equation regression and standard error of each segment. Then, we used an exploratory...
Ce papier entre dans le cadre du pré-traitement de séries temporellesà l'aide d'une approche par segmentation grâcè a laquelle nous avons construit des indices de similarité permettant la découverte de points de rupture entre deux séries, la modélisation de liaisons non linéaires, l'identification de l'´ evolution locale (par segment) de deux phéno...
The proposed methodology in this paper allows to build a meta-segmentation of a time series. We have proposed a first method to segment a time series in several linear segments based on an exploratory approach and a heteroscedastic gaussian linear model estimated by the REML estimator. Then we have improved this method with an a priori step to bett...
The time series are decomposed into several types of changes: trend, seasonality, volatility and noise. They may be more or less regular as the application domain. Behavioral changes that characterize these series are mainly of several types: peak (price of energy in tense situation, but on a very short period), jumps in level or trend (or separati...
The time series are decomposed into several types of changes: trend, seasonality, volatility and noise. They may be more or less regular as the application domain. Behavioral changes that characterize these series are mainly of several types: peak (price of energy in tense situation, but on a very short period), jumps in level or trend (or separati...
The method proposes to segment a time series. It offers an original process with a first step of preparing data which is crucial to build the most adequate structure to initialize the second step of modelling an heteroskedastic linear model including the different trends, levels and variances. This method can be used in a lot of domains and to set...
In his book entitled "Regression Modeling: Methods, Theory, and Computation with SAS", Michael Panik gives a large field of modeling methods starting from simple linear regression to times series or spatial data, but also including robust regression or semi-parametric regression. The author’s goal is also to provide, for most methods, some applicat...
La méthode proposée permet de segmenter une série temporelle. Elle offre une démarche originale contenant une phase essentielle de préparation des données afin de produire la structure la plus adéquate possible pour initialiser la phase de modélisation selon un modèle linéaire hétéroscédastique incluant des tendances, des constantes et des dispersi...
In many applications, in marketing for instance, the modelling of a response variable has not only the goal to know the main explanatory variables, but also the most contributive and the top that the importance order. We propose a method to put in order explanatory of ordinal variables to modelize a response ordinal variable. The main problem is th...
In large companies, Online Analytical Processing (OLAP) technologies are widely used by business analysts as a decision-support tool. The exploration of the data is performed using operators such as drill-down, roll-up or slice. While exploring the cube, end-users are rapidly confronted with analysing a huge number of drill-paths according to the d...
This chapter is in the scope of static and dynamic discovery-driven explorations of a data cube. It presents different methods to facilitate the whole process of data exploration. Each kind of analysis (static or dynamic) is developed for either a count measure or a quantitative measure. Both are based on the calculation, on the fly, of specific st...
Dans de nombreuses applications, notamment en marketing, la modélisation d'un phénomène demande de connaitre les variables qui l'influencent, mais aussi celles qui contribuent le plus et notamment leur ordre d'importance. Dans ce papier, nous proposons une méthode pour hiérarchiser des variables ordinales explicatives d'une variable réponse ordinal...
OLAP applications are widely used by business analysts as a decision support tool. While exploring the cube, end-users are
rapidly confronted by analyzing a huge number of drill paths according to the different dimensions. Generally, analysts are
only interested in a small part of them which corresponds to either high statistical associations betwe...
This paper introduces a new method to build a graphical model of categorical variables (a Free model) in the frame of structural equation models. Firstly a clustering of variables is applied, then for each cluster a numeric latent variable is calculated. After that, links between latent variables are searched for and expert decision issued to posit...
In large companies, On-Line Analytical Processing (OLAP) technologies are widely used by business analysts as a decision support
tool. Nevertheless, while exploring the cube, analysts are rapidly confronted by analyzing a huge number of visible cells
to identify the most interesting ones. Coupling OLAP technologies and mining methods may help them...
The ordinal responses in data survey or in therapeutic research are mainly modeled by ordinal logistic regression (or proportional odds ratio model). For instance, in case of grouped data (where explanatory variables are categorical), mean levels are provided with level coefficients. This measure can be limited in terms of contribution, namely if t...
Les données sur les enqu\^etes l'opinion ou en recherche médicale (épidémiologie, essais thérapeutiques) sont souvent modélisées par la régression logistique lorsque la réponse est ordinale (échelle de satisfaction à propos d'un produit électrique, état d'avancement d'une affection). De plus, quand les variables candidates à l'explication sont qual...
Partial least squares (PLS) path modeling has found increased applications in customer satisfaction analysis thanks to its ability to handle complex models. A modified PLS path modeling algorithm together with a model building strategy are introduced and applied to customer satisfaction analysis at the French energy supplier Electricité de France....
If frame of path modeling, we have developed the PML Approach (Partial Maximum Likelihood) which generalize the PLS Approach (Partial Least Squares) to all types of variables (numeric, boolean, ordinal, nominal and count-able). The PML Approach, based on the Generalized Linear Models), is notably using in satisfaction survey where scales of answers...
The goal of this thesis is to present research works on ten years (1995-2005) in statistical methods concerning categorical data, with two approaches: discover of structures by exploratory data analysis and phenomena modeling by inferential statistics. The first one introduces new concepts in clustering mixture variables (numeric and/or categorical...
Customer satisfaction and retention are key issues for organizations in todays competitive market place. Furthermore the French electricity industry is entering a new transition period with the opening of the market. In response, Electricité de France has set up a process of surveys to quantify customer satisfaction with an aim of retention and de...
The LISREL, PLS and RFPC approaches are path modeling methods used in frame of Marketing domain to build satisfaction indexes. The two latter methods have two advantages with respect to the first one: the ability to treat missing data and to converge without difficulty in the step of the estimation of model. But they supports only variables on nume...
Dans certaines applications (études de bases de données clientèles, par exemple), notamment lorsque le nombre d'individus et de variables est très élevé, le choix des variables "explicatives" peut se révéler délicat car de nombreuses p-valeurs associées à ces tests sont ridiculement petites du fait de la dimension asymptotique du problème. Par aill...
The structural Linear Models are very used in frame of marketing to make satisfaction indexes and to understand the leverage of satisfaction. These models are based on predifined a priori models by experts of a studied domain and allow to establish links between different aspects of customers' satisfaction. To make these models, there are two main...
The statistical mathching allows to enrich a database (receiver file) with new variables coming from a other dataset (donor file). For instance, this approach is very interesting to improve the efficiency of operational marketing campaigns at means supplementary information on customers. A feasible solution is to predict new variables by modelling....
Nous présentons une méthodologie basée sur les chaînes de Markov cachées pour la modélisation et l'analyse statistique de courbes de consommation électrique. Suite à une analyse de variance qui permet d'estimer l'effet sur la log-consommation de facteurs contrôlés (mois, jour, heure, type de contrat et puissance maximale souscrite), nous modélisons...
La classification de p variables s'applique généralement quand celles-ci sont nombreuses ou lorsque l'on suppose que certaines d'entre elles sont fortement corrélées. D'autre part, la classification a également deux autres objectifs : Identifier des thèmes discriminants (hiérarchisation du niveau de contribution des variables) dans chaque classe et...
Eurostat conference on :New Techniques and Technologies for Statistics,
Exchange of Technology and Know-how
Mieux connaître les attentes et les besoins de la clientèle est un des axes stratégiques majeurs d " EDF. Cette meilleure connaissance est notamment obtenue grâce à l " analyse statistique de différents types de données (consommation d " électricité, fidélisation, nouveaux clients, qualité de fourniture, enquêtes de satisfaction, ...). Pour cela, E...
EDF réalise de nombreuses enquêtes auprès de sa clientèle résidentielle, afin de connaître son opinion sur la qualité de la fourniture d'électricité, sur les services commerciaux, ou encore, comme c'est l'objet du présent papier, sur les moyens de chauffage. Le questionnaire analysé permet d'avoir des informations sur les caractéristiques socio-dém...
Un des axes stratégiques majeurs d'Electricité de France est de développer les usages de l'électricité auprès de sa clientèle résidentielle. Pour cela, il s'agit de connaître les effets qui interagissent entre les différents acteurs du marché résidentiel français de l'électricité (clients, constructeurs, concurrents, ...). Malheureusement, comme l'...
Medical research data are often modelled either by analysis of variance (continuous response variables) or by logistic regression (nominal or ordinal response variables). These methods allow one to eliminate the structural effects among variables which are candidates for an explanation and to measure these effects, other things being equal. Our dep...
EDF conducted a measurement program over three years, from June 1993 to June 1996. Its purpose was to measure power quality, customers' opinion and the connection between them. Three hundred measuring devices (Qualimetre) were installed at the premises of industrial customers between June 1993 and June 1995. The QUALIMAT experiment gathered such a...