Arthur Charpentier

Arthur Charpentier
University of Quebec in Montreal | UQAM · Department of Mathematics

PhD, Leuven

About

153
Publications
81,966
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,957
Citations
Additional affiliations
September 2017 - November 2017
University of Rennes
Position
  • Professor (Full)
September 2002 - September 2007
September 2011 - present
University of Quebec in Montreal
Position
  • Professor (Assistant)
Education
June 2002 - June 2006
KU Leuven
Field of study
  • Mathematics
June 1998 - June 1999
Université Paris Dauphine-PSL
Field of study
  • Mathematical Economics
September 1996 - June 1999
Mines Paris, PSL University
Field of study
  • ENSAE - statistics and actuarial science

Publications

Publications (153)
Preprint
Full-text available
Collusion in market pricing is a concept associated with human actions to raise market prices through artificially limited supply. Recently, the idea of algorithmic collusion was put forward, where the human action in the pricing process is replaced by automated agents. Although experiments have shown that collusive market equilibria can be reached...
Preprint
Full-text available
Recently, optimal transport-based approaches have gained attention for deriving counterfactuals, e.g., to quantify algorithmic discrimination. However, in the general multivariate setting, these methods are often opaque and difficult to interpret. To address this, alternative methodologies have been proposed, using causal graphs combined with itera...
Preprint
Full-text available
In this paper, we link two existing approaches to derive counterfactuals: adaptations based on a causal graph, as suggested in Ple\v{c}ko and Meinshausen (2020) and optimal transport, as in De Lara et al. (2024). We extend "Knothe's rearrangement" Bonnotte (2013) and "triangular transport" Zech and Marzouk (2022a) to probabilistic graphical models,...
Preprint
Full-text available
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications such as predicting payment defaults or assessing medical risks. The model must then be well-calibrated to ensure alignment between predicted probabilities and actual outcomes. However, when score heterogeneity deviat...
Chapter
Group fairness, as studied in Chap. 8, considered fairness from a global perspective, in the entire population, by attempting to answer the question “are individuals in the advantaged group and in the disadvantaged group treated differently?” Or more formally, are the predictions and the protected variable globally independent? Here, we focus on a...
Chapter
In this chapter, we present important concepts for when dealing with predictive models. We start with a discussion about the interpretability and explainability of models and algorithms, presenting different tools that could help us to understand “why” the predicted outcome of the model is the one we got. Then, we will discuss accuracy, which is us...
Chapter
Classically, to estimate a model, we look for a model (in a pre-defined class) that minimizes a prediction error, or that maximizes the accuracy. If the model is required to satisfy constraints, a natural idea is to add a penalty term in the objective function. The idea of “in-processing” is to get a trade-off between accuracy and fairness. As prev...
Chapter
“Insurance is the contribution of the few to the misfortune of the many” is a simple way to describe what insurance is. But it doesn’t say what the “contribution” should be, to be fair. In this chapter, we return to the fundamentals of pricing and risk sharing, and at the end we mention other models used in insurance (to predict future payments to...
Chapter
Assessing whether a model is discriminatory, or not, is a complex problem. As in Chap. 3, where we discussed global and local interpretability of predictive models, we start with some global approaches (the local ones will be discussed in Chap. 9), also called “group fairness,” comparing quantities between groups, usually identified by sensitive at...
Chapter
An important challenge for actuaries is that they need to answer causal questions with observational data. After a brief discussion about correlation and causality, we describe the “causation ladder,” and the three rungs: association or correlation (“what if I see...”), intervention (“what if I do...”), and counterfactuals (“what if I had done...”)...
Chapter
In this chapter, we give an overview on predictive modeling, used by actuaries. Historically, we moved from relatively homogeneous portfolios to tariff classes, and then to modern insurance, with the concept of “premium personalization.” Modern modeling techniques are presented, starting with econometric approaches, before presenting machine-learni...
Chapter
Actuaries now collect all kinds of information about policyholders, which can not only be used to refine a premium calculation but also to carry out prevention operations. We return here to the choice of relevant variables in pricing, with emphasis on actuarial, operational, legal and ethical motivations. In particular, we discuss the idea of captu...
Chapter
“Pre-processing” is about distorting the training sample to ensure that the model we obtain is “fair,” with respect to some criteria (defined in the previous chapters). The two standard techniques are either to modify the original dataset (and to distort features to make them “fair,” or independent of the sensitive attribute), or to use weights (as...
Chapter
We return here to the usual protected, or sensitive, variables that can lead to discrimination in insurance. We mention direct discrimination, with race and ethnic origin, gender and sex, or age. We also discuss genetic-related discrimination, and as several official protected attributes are not related to biology but to social identity, we return...
Chapter
The idea of “post-processing” is relatively simple, as we change neither the training data, nor the model that has been estimated; we simply transform the predictions obtained, to make them “fair” (according to some specific criteria). As actuaries care about calibration, and the associated concept of a “well-balanced” model, quite naturally, we us...
Article
In the standard use case of Algorithmic Fairness, the goal is to eliminate the relationship between a sensitive variable and a corresponding score. Throughout recent years, the scientific community has developed a host of definitions and tools to solve this task, which work well in many practical applications. However, the applicability and effecti...
Chapter
Many problems ask a question that can be formulated as a causal question: what would have happened if...? For example, would the person have had surgery if he or she had been Black? To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such...
Article
Full-text available
With their intensive use of data to classify and price risk, insurers have often been confronted with data-related issues of fairness and discrimination. This paper provides a comparative review of discrimination issues raised by traditional statistics versus machine learning in the context of insurance. We first examine historical contestations of...
Chapter
Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared...
Preprint
Full-text available
Algorithmic Fairness and the explainability of potentially unfair outcomes are crucial for establishing trust and accountability of Artificial Intelligence systems in domains such as healthcare and policing. Though significant advances have been made in each of the fields separately, achieving explainability in fairness applications remains challen...
Preprint
Full-text available
The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to hi...
Preprint
Full-text available
Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared...
Article
A fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to account for heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects panel model and propose a new model: expectile regression with fixed effects (ERFE). The ERFE model applies the wit...
Preprint
Full-text available
Many problems ask a question that can be formulated as a causal question: "what would have happened if...?" For example, "would the person have had surgery if he or she had been Black?" To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (...
Preprint
Full-text available
Since the beginning of their history, insurers have been known to use data to classify and price risks. As such, they were confronted early on with the problem of fairness and discrimination associated with data. This issue is becoming increasingly important with access to more granular and behavioural data, and is evolving to reflect current techn...
Article
Full-text available
Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with interesting properties, that can be easily linked to economic theory. In this paper, we first show that modeling top incomes with Paret...
Article
Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, the sum of fitted values can depart from the observed totals to a large extent. The possible lack of balance when models are trained by minimizing deviance outside the familiar GLM with canonical link setting has bee...
Preprint
Full-text available
The fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to summarize the variable relationships in the presence of heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects model and propose a new model: expectile regression with fixed-effect...
Article
Full-text available
Generalized estimating equations (GEE) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response – and therefore do not account for data heterogeneity. Here, we combine the GEE with the asymmetric least squares (expectile) regression to...
Preprint
Full-text available
The economic consequences of drought episodes are increasingly important, although they are often difficult to apprehend in part because of the complexity of the underlying mechanisms. In this article, we will study one of the consequences of drought, namely the risk of subsidence (or more specifically clay shrinkage induced subsidence), for which...
Preprint
Full-text available
The peer-to-peer (P2P) economy has been growing with the advent of the Internet, with well known brands such as Uber or Airbnb being examples thereof. In the insurance sector the approach is still in its infancy, but some companies have started to explore P2P-based collaborative insurance products (eg. Lemonade in the U.S. or Inspeer in France). Th...
Article
Natural disasters offer a specific case study of the mix of public and private insurance. Indeed, the experience accumulated over the past decades has made it possible to transform poorly-known hazards like flood losses, long considered uninsurable, into risks that can be assessed with some precision. They exemplify, however, the affordability issu...
Article
Full-text available
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent pick...
Preprint
Full-text available
Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, there are nevertheless endless debates about the choice of the right loss function to be used to train the machine learning model, as well as about the appropriate metric to assess the performances of competing model...
Preprint
Full-text available
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. We show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby generalizing Machina's result in Machina...
Article
Full-text available
A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation...
Chapter
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (value-at-risk, expected shortfall) or reinsurance premiums and related quantities (large claim index, return period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (poss...
Article
Les sites qui proposent à leurs utilisateurs de reconstituer en ligne leur arbre généalogique fleurissent sur Internet. Cet article analyse le travail de collecte et de saisie effectué par ces utilisateurs et comment il pourrait être utilisé en démographie historique, afin de compléter la connaissance des générations du passé. Pour cela, les résult...
Article
Full-text available
An extended SIR model, including several features of the recent COVID-19 outbreak, is considered: the infected and recovered individuals can either be detected or undetected and we also integrate an intensive care unit (ICU) capacity. We identify the optimal policy for controlling the epidemic dynamics using both lockdown and detection intervention...
Article
Full-text available
The aim of this article is to assess the impact of Big Data technologies for insurance ratemaking, with a special focus on motor products.The first part shows how statistics and insurance mechanisms adopted the same aggregate viewpoint. It made visible regularities that were invisible at the individual level, further supporting the classificatory a...
Preprint
Full-text available
Family history is usually seen as a significant factor insurance companies look at when applying for a life insurance policy. Where it is used, family history of cardiovascular diseases, death by cancer, or family history of high blood pressure and diabetes could result in higher premiums or no coverage at all. In this article, we use massive (hist...
Preprint
Full-text available
We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the...
Preprint
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent pick...
Article
What future for predictive probabilities in insurance? Since insurance policies are classical examples of contingency contracts, insurers have to regularly quantify uncertainty and calculate probabilities so as to offer premiums that are “fair” with respect to the obligations of both parties. Is it not high time to ask questions about insurance pra...
Preprint
Full-text available
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (Value-at-Risk, Expected Shortfall) or reinsurance premiums and related quantities (Large Claim Index, Return Period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (poss...
Preprint
Full-text available
A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation...
Article
The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the...
Preprint
Full-text available
This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pair of adjacent locations. Two types of agents are located at the nodes of this network. The buyers ch...
Preprint
Full-text available
Recently, Alderson et al. (2009) mentioned that (strict) Scale-Free networks were rare, in real life. This might be related to the statement of Stumpf, Wiuf & May (2005), that sub-networks of scale-free networks are not scale-free. In the later, those sub-networks are asymptotically scale-free, but one should not forget about second-order deviation...
Article
Full-text available
P r o f e s s e u r, U n i v e r s i t é d u Q u é b e c à M o n t r é a l La théorie des réseaux, ou des graphes, est née en 1735, à la suite des travaux de Leonhard Euler, qui essayait de trouver une promenade-à partir d'un point donné-qui fasse revenir à ce point en passant une fois et une seule par chacun des sept ponts de la ville de Königsber...
Article
Full-text available
We propose an Aitken estimator for Gini regression. The suggested A-Gini estimator is proven to be a U-statistics. Monte Carlo simulations are provided to deal with heteroskedasticity and to make some comparisons between the generalized least squares and the Gini regression. A Gini-White test is proposed and shows that a better power is obtained co...
Preprint
Full-text available
The well-known generalized estimating equations (GEE) is widely used to estimate the effect of the covariates on the mean of the response variable. We apply the GEE method using the asymmetric least-square regression (expectile) to analyze the longitudinal data. Expectile regression naturally extends the classical least squares method and has prope...
Preprint
Full-text available
The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the...
Article
Full-text available
Pourquoi les Grecs n'ont-ils pas inventé la théorie des probabilités ? L a question peut surprendre, mais quand on voit la connaissance des Grecs en géométrie, entre l'époque de Pythagore (550 avant notre ère) et celle d'Euclide (300 avant notre ère), on peut être surpris qu'aucune théorie du hasard n'ait été proposée. Il faut peut-être se souvenir...
Article
Your name may feel unique to you – but chances are that someone, somewhere is called the same thing. Arthur Charpentier and Baptiste Coulmont estimate the proportion of shared identities in large social groups.
Article
Full-text available
A u XIX e siècle, si plusieurs astronomes mesuraient la vitesse d'un même objet céleste, ils obtenaient (souvent) plusieurs mesures différentes. Pour savoir laquelle utiliser dans leurs calculs, l'idée d'utiliser « la méthode des moyennes » s'est rapidement imposée-comme le rappelle Stahl [2006], et surtout Sheynin [1973]-cette moyenne ayant une pr...
Article
Full-text available
L'économétrie et l'apprentissage machine semblent avoir une finalité en commun: construire un modèle prédictif, pour une variable d'intérêt, à l'aide de variables explicatives (ou features). Pourtant, ces deux champs se sont développés en parallèle, créant ainsi deux cultures différentes, pour paraphraser Breiman (2001a). Le premier visait à constr...
Article
Full-text available
Econometrics and machine learning seem to have one common goal: to construct a predictive model, for a variable of interest, using explanatory variables (or features). However, these two fields developed in parallel, thus creating two different cultures, to paraphrase Breiman (2001). The first was to build probabilistic models to describe economic...
Article
Full-text available
This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups. The estimation is based on the generalization of the "birthday paradox problem". The main results is that, in societies such as France or the United States, identity colli...
Preprint
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying a...
Preprint
Full-text available
A u XIX e siècle, si plusieurs astronomes mesuraient la vitesse d'un même objet céleste, ils obtenaient (souvent) plusieurs mesures différentes. Pour savoir laquelle utiliser dans leurs calculs, l'idée d'utiliser « la méthode des moyennes » s'est rapidement imposée-comme le rappelle Stahl [2006], et surtout Sheynin [1973]-cette moyenne ayant une pr...
Article
Full-text available
Risques n° 109 107 Les fondements de la segmentation tarifaire L es techniques de l'assurance reposent sur la loi des grands nombres, où des risques homogènes de nature et homogènes de valeur sont agrégés. Il est alors mathémati-quement essentiel que les risques soient classés par sous-catégories. Les assureurs évoqueront un principe de justice act...
Article
Full-text available
Quantile and expectile regression models pertain to the estimation of unknown quantiles/expectiles of the cumulative distribution function of a dependent variable as a function of a set of covariates and a vector of regression coecients. Both approaches make no assumption on the shape of the distribution of the response variable, allowing for inves...
Article
Full-text available
Prudence, prévention, précaution… et prévisions I l est souvent admis que la précaution se distingue de la prévention par l'absence d'identification des risques. La prévention pourrait être associée à la protection contre des risques identifiés, alors que la notion de précaution interroge sur les actions possibles face aux risques non encore identi...
Article
Full-text available
La dynamique de reproduction des lapins et le nombre d'or F ibonacci est connu par les étudiants du monde entier pour avoir posé le problème suivant : « Partant d'un couple, combien de couples de lapins obtiendrons-nous après un nombre donné de mois sachant que chaque couple produit chaque mois un nouveau couple, lequel ne devient productif qu'aprè...
Article
Full-text available
Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that...
Article
Full-text available
In this paper, we investigate the impact of the accident reporting strategy of drivers, within a Bonus-Malus system. We exhibit the induced modification of the corresponding class level transition matrix and derive the optimal reporting strategy for rational drivers. The hunger for bonuses induces optimal thresholds under which, drivers do not clai...
Article
Full-text available
Traditionally, actuaries have used run-off triangles to estimate reserve ("macro" models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with "macro" and "micro" models. We study theoretical properties of econometric...
Preprint
Traditionally, actuaries have used run-off triangles to estimate reserve ("macro" models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with "macro" and "micro" models. We study theoretical properties of econometric...
Article
Full-text available
Le rôle d’un actuaire dans une entreprise d’assurance est assez souvent d’estimer la probabilité qu’un événement survienne, ou ses possibles conséquences financières, et est également fonction de variables dites « explicatives ». On voit en effet que certaines variables sont « statistiquement corrélées » avec la survenance d’un accident dans l’anné...
Article
Full-text available
Au printemps dernier, l’économiste Paul Romer lançait une discussion passionnante autour de la notion de « mathiness ». Son essai très remarqué a réactivé des débats plus larges sur la place du formalisme mathématique en économie, les stratégies de modélisation (réalisme des hypothèses, etc.) et les liens entre théorie et travail empirique. Ces déb...
Article
In this paper, we investigate a technique inspired by Ripley’s circumference method to correct bias of density estimation of edges (or frontiers) of regions. The idea of the method was theoretical and difficult to implement. We provide a simple technique – based of properties of Gaussian kernels – to efficiently compute weights to correct border bi...
Article
Full-text available
Cet article présente un survol les techniques usuelles de modélisation de séries nancières multiples. Pus spéciquement, on cherchera a obtenir une extention multivariée des modèles GARCH. Dans un premier temps, nous verrons comment modéliser la dynamique de la matrice de corrélation (conditionnelle), puis nous verrons comment généraliser cette appr...
Article
Full-text available
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this article, we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying...
Article
Full-text available
Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that...
Article
This paper provides a meta-analysis of 1651 point estimates of Feldstein and Horioka saving retention coefficient from 49 peer-reviewed papers published over three decades. We get two main results. First, correcting for publication bias, we find a consistent underlying coefficient lying between 0.56 and 0.67 for studies using the original paper....
Article
Full-text available
S iméon Denis Poisson a travaillé sur le calcul des probabilités pendant vingt ans, de 1820 à 1840. Comme beaucoup de ses contemporains, il a effectué ses premiers travaux sur des problèmes de jeux, avec son Mémoire sur l'avantage du banquier au jeu de trente et quarante, lu à l'Académie des sciences le 13 mars 1820. Mais son premier travail import...
Article
Full-text available
L’assurance repose fondamentalement sur l’idée que la mutualisation des risques entre des assurés est possible. Cette mutualisation, qui peut être vue comme une relecture actuarielle de la loi des grands nombres, n’a de sens qu’au sein d’une population de risques « homogènes » [Charpentier, 2011]. Cette condition (actuarielle) impose aux assureurs...
Article
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developmen