Arthur Charpentier

Arthur Charpentier
Université du Québec à Montréal | UQAM · Department of Mathematics

PhD, Leuven

About

133
Publications
71,746
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,586
Citations
Additional affiliations
September 2017 - November 2017
Université de Rennes 1
Position
  • Professor (Full)
September 2011 - present
Université du Québec à Montréal
Position
  • Professor (Assistant)
September 2007 - September 2011
Université de Rennes 1
Position
  • Professor (Associate)
Education
June 2002 - June 2006
KU Leuven
Field of study
  • Mathematics
June 1998 - June 1999
Paris Dauphine University
Field of study
  • Mathematical Economics
September 1996 - June 1999
MINES ParisTech
Field of study
  • ENSAE - statistics and actuarial science

Publications

Publications (133)
Chapter
Many problems ask a question that can be formulated as a causal question: what would have happened if...? For example, would the person have had surgery if he or she had been Black? To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such...
Article
Full-text available
With their intensive use of data to classify and price risk, insurers have often been confronted with data-related issues of fairness and discrimination. This paper provides a comparative review of discrimination issues raised by traditional statistics versus machine learning in the context of insurance. We first examine historical contestations of...
Chapter
Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared...
Preprint
Full-text available
Algorithmic Fairness and the explainability of potentially unfair outcomes are crucial for establishing trust and accountability of Artificial Intelligence systems in domains such as healthcare and policing. Though significant advances have been made in each of the fields separately, achieving explainability in fairness applications remains challen...
Preprint
Full-text available
The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to hi...
Preprint
Full-text available
Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared...
Article
A fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to account for heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects panel model and propose a new model: expectile regression with fixed effects (ERFE). The ERFE model applies the wit...
Preprint
Full-text available
Many problems ask a question that can be formulated as a causal question: "what would have happened if...?" For example, "would the person have had surgery if he or she had been Black?" To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (...
Preprint
Full-text available
Since the beginning of their history, insurers have been known to use data to classify and price risks. As such, they were confronted early on with the problem of fairness and discrimination associated with data. This issue is becoming increasingly important with access to more granular and behavioural data, and is evolving to reflect current techn...
Article
Full-text available
Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with interesting properties, that can be easily linked to economic theory. In this paper, we first show that modeling top incomes with Paret...
Article
Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, the sum of fitted values can depart from the observed totals to a large extent. The possible lack of balance when models are trained by minimizing deviance outside the familiar GLM with canonical link setting has bee...
Preprint
Full-text available
The fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to summarize the variable relationships in the presence of heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects model and propose a new model: expectile regression with fixed-effect...
Article
Full-text available
Generalized estimating equations (GEE) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response – and therefore do not account for data heterogeneity. Here, we combine the GEE with the asymmetric least squares (expectile) regression to...
Preprint
Full-text available
The economic consequences of drought episodes are increasingly important, although they are often difficult to apprehend in part because of the complexity of the underlying mechanisms. In this article, we will study one of the consequences of drought, namely the risk of subsidence (or more specifically clay shrinkage induced subsidence), for which...
Preprint
Full-text available
The peer-to-peer (P2P) economy has been growing with the advent of the Internet, with well known brands such as Uber or Airbnb being examples thereof. In the insurance sector the approach is still in its infancy, but some companies have started to explore P2P-based collaborative insurance products (eg. Lemonade in the U.S. or Inspeer in France). Th...
Article
Natural disasters offer a specific case study of the mix of public and private insurance. Indeed, the experience accumulated over the past decades has made it possible to transform poorly-known hazards like flood losses, long considered uninsurable, into risks that can be assessed with some precision. They exemplify, however, the affordability issu...
Article
Full-text available
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent pick...
Preprint
Full-text available
Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, there are nevertheless endless debates about the choice of the right loss function to be used to train the machine learning model, as well as about the appropriate metric to assess the performances of competing model...
Preprint
Full-text available
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. We show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby generalizing Machina's result in Machina...
Article
Full-text available
A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation...
Chapter
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (value-at-risk, expected shortfall) or reinsurance premiums and related quantities (large claim index, return period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (poss...
Article
Les sites qui proposent à leurs utilisateurs de reconstituer en ligne leur arbre généalogique fleurissent sur Internet. Cet article analyse le travail de collecte et de saisie effectué par ces utilisateurs et comment il pourrait être utilisé en démographie historique, afin de compléter la connaissance des générations du passé. Pour cela, les résult...
Article
Full-text available
An extended SIR model, including several features of the recent COVID-19 outbreak, is considered: the infected and recovered individuals can either be detected or undetected and we also integrate an intensive care unit (ICU) capacity. We identify the optimal policy for controlling the epidemic dynamics using both lockdown and detection intervention...
Article
Full-text available
The aim of this article is to assess the impact of Big Data technologies for insurance ratemaking, with a special focus on motor products.The first part shows how statistics and insurance mechanisms adopted the same aggregate viewpoint. It made visible regularities that were invisible at the individual level, further supporting the classificatory a...
Preprint
Full-text available
Family history is usually seen as a significant factor insurance companies look at when applying for a life insurance policy. Where it is used, family history of cardiovascular diseases, death by cancer, or family history of high blood pressure and diabetes could result in higher premiums or no coverage at all. In this article, we use massive (hist...
Preprint
Full-text available
We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the...
Preprint
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent pick...
Article
What future for predictive probabilities in insurance? Since insurance policies are classical examples of contingency contracts, insurers have to regularly quantify uncertainty and calculate probabilities so as to offer premiums that are “fair” with respect to the obligations of both parties. Is it not high time to ask questions about insurance pra...
Preprint
Full-text available
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (Value-at-Risk, Expected Shortfall) or reinsurance premiums and related quantities (Large Claim Index, Return Period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (poss...
Preprint
Full-text available
A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation...
Article
The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the...
Preprint
Full-text available
This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pair of adjacent locations. Two types of agents are located at the nodes of this network. The buyers ch...
Preprint
Full-text available
Recently, Alderson et al. (2009) mentioned that (strict) Scale-Free networks were rare, in real life. This might be related to the statement of Stumpf, Wiuf & May (2005), that sub-networks of scale-free networks are not scale-free. In the later, those sub-networks are asymptotically scale-free, but one should not forget about second-order deviation...
Article
Full-text available
P r o f e s s e u r, U n i v e r s i t é d u Q u é b e c à M o n t r é a l La théorie des réseaux, ou des graphes, est née en 1735, à la suite des travaux de Leonhard Euler, qui essayait de trouver une promenade-à partir d'un point donné-qui fasse revenir à ce point en passant une fois et une seule par chacun des sept ponts de la ville de Königsber...
Article
Full-text available
We propose an Aitken estimator for Gini regression. The suggested A-Gini estimator is proven to be a U-statistics. Monte Carlo simulations are provided to deal with heteroskedasticity and to make some comparisons between the generalized least squares and the Gini regression. A Gini-White test is proposed and shows that a better power is obtained co...
Preprint
Full-text available
The well-known generalized estimating equations (GEE) is widely used to estimate the effect of the covariates on the mean of the response variable. We apply the GEE method using the asymmetric least-square regression (expectile) to analyze the longitudinal data. Expectile regression naturally extends the classical least squares method and has prope...
Preprint
Full-text available
The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the...
Article
Full-text available
Pourquoi les Grecs n'ont-ils pas inventé la théorie des probabilités ? L a question peut surprendre, mais quand on voit la connaissance des Grecs en géométrie, entre l'époque de Pythagore (550 avant notre ère) et celle d'Euclide (300 avant notre ère), on peut être surpris qu'aucune théorie du hasard n'ait été proposée. Il faut peut-être se souvenir...
Article
Your name may feel unique to you – but chances are that someone, somewhere is called the same thing. Arthur Charpentier and Baptiste Coulmont estimate the proportion of shared identities in large social groups.
Article
Full-text available
A u XIX e siècle, si plusieurs astronomes mesuraient la vitesse d'un même objet céleste, ils obtenaient (souvent) plusieurs mesures différentes. Pour savoir laquelle utiliser dans leurs calculs, l'idée d'utiliser « la méthode des moyennes » s'est rapidement imposée-comme le rappelle Stahl [2006], et surtout Sheynin [1973]-cette moyenne ayant une pr...
Article
Full-text available
L'économétrie et l'apprentissage machine semblent avoir une finalité en commun: construire un modèle prédictif, pour une variable d'intérêt, à l'aide de variables explicatives (ou features). Pourtant, ces deux champs se sont développés en parallèle, créant ainsi deux cultures différentes, pour paraphraser Breiman (2001a). Le premier visait à constr...
Article
Full-text available
Econometrics and machine learning seem to have one common goal: to construct a predictive model, for a variable of interest, using explanatory variables (or features). However, these two fields developed in parallel, thus creating two different cultures, to paraphrase Breiman (2001). The first was to build probabilistic models to describe economic...
Article
Full-text available
This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups. The estimation is based on the generalization of the "birthday paradox problem". The main results is that, in societies such as France or the United States, identity colli...
Preprint
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying a...
Preprint
Full-text available
A u XIX e siècle, si plusieurs astronomes mesuraient la vitesse d'un même objet céleste, ils obtenaient (souvent) plusieurs mesures différentes. Pour savoir laquelle utiliser dans leurs calculs, l'idée d'utiliser « la méthode des moyennes » s'est rapidement imposée-comme le rappelle Stahl [2006], et surtout Sheynin [1973]-cette moyenne ayant une pr...
Article
Full-text available
Risques n° 109 107 Les fondements de la segmentation tarifaire L es techniques de l'assurance reposent sur la loi des grands nombres, où des risques homogènes de nature et homogènes de valeur sont agrégés. Il est alors mathémati-quement essentiel que les risques soient classés par sous-catégories. Les assureurs évoqueront un principe de justice act...
Article
Full-text available
Quantile and expectile regression models pertain to the estimation of unknown quantiles/expectiles of the cumulative distribution function of a dependent variable as a function of a set of covariates and a vector of regression coecients. Both approaches make no assumption on the shape of the distribution of the response variable, allowing for inves...
Article
Full-text available
Prudence, prévention, précaution… et prévisions I l est souvent admis que la précaution se distingue de la prévention par l'absence d'identification des risques. La prévention pourrait être associée à la protection contre des risques identifiés, alors que la notion de précaution interroge sur les actions possibles face aux risques non encore identi...
Article
Full-text available
La dynamique de reproduction des lapins et le nombre d'or F ibonacci est connu par les étudiants du monde entier pour avoir posé le problème suivant : « Partant d'un couple, combien de couples de lapins obtiendrons-nous après un nombre donné de mois sachant que chaque couple produit chaque mois un nouveau couple, lequel ne devient productif qu'aprè...
Article
Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that...
Article
Full-text available
In this paper, we investigate the impact of the accident reporting strategy of drivers, within a Bonus-Malus system. We exhibit the induced modification of the corresponding class level transition matrix and derive the optimal reporting strategy for rational drivers. The hunger for bonuses induces optimal thresholds under which, drivers do not clai...
Article
Full-text available
Traditionally, actuaries have used run-off triangles to estimate reserve ("macro" models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with "macro" and "micro" models. We study theoretical properties of econometric...
Article
Full-text available
Le rôle d’un actuaire dans une entreprise d’assurance est assez souvent d’estimer la probabilité qu’un événement survienne, ou ses possibles conséquences financières, et est également fonction de variables dites « explicatives ». On voit en effet que certaines variables sont « statistiquement corrélées » avec la survenance d’un accident dans l’anné...
Article
Full-text available
Au printemps dernier, l’économiste Paul Romer lançait une discussion passionnante autour de la notion de « mathiness ». Son essai très remarqué a réactivé des débats plus larges sur la place du formalisme mathématique en économie, les stratégies de modélisation (réalisme des hypothèses, etc.) et les liens entre théorie et travail empirique. Ces déb...
Article
In this paper, we investigate a technique inspired by Ripley’s circumference method to correct bias of density estimation of edges (or frontiers) of regions. The idea of the method was theoretical and difficult to implement. We provide a simple technique – based of properties of Gaussian kernels – to efficiently compute weights to correct border bi...
Article
Full-text available
Cet article présente un survol les techniques usuelles de modélisation de séries nancières multiples. Pus spéciquement, on cherchera a obtenir une extention multivariée des modèles GARCH. Dans un premier temps, nous verrons comment modéliser la dynamique de la matrice de corrélation (conditionnelle), puis nous verrons comment généraliser cette appr...
Article
Full-text available
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this article, we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying...
Article
Full-text available
Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that...
Article
This paper provides a meta-analysis of 1651 point estimates of Feldstein and Horioka saving retention coefficient from 49 peer-reviewed papers published over three decades. We get two main results. First, correcting for publication bias, we find a consistent underlying coefficient lying between 0.56 and 0.67 for studies using the original paper....
Article
Full-text available
S iméon Denis Poisson a travaillé sur le calcul des probabilités pendant vingt ans, de 1820 à 1840. Comme beaucoup de ses contemporains, il a effectué ses premiers travaux sur des problèmes de jeux, avec son Mémoire sur l'avantage du banquier au jeu de trente et quarante, lu à l'Académie des sciences le 13 mars 1820. Mais son premier travail import...
Article
Full-text available
L’assurance repose fondamentalement sur l’idée que la mutualisation des risques entre des assurés est possible. Cette mutualisation, qui peut être vue comme une relecture actuarielle de la loi des grands nombres, n’a de sens qu’au sein d’une population de risques « homogènes » [Charpentier, 2011]. Cette condition (actuarielle) impose aux assureurs...
Article
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying a...
Article
Full-text available
Copula modelling has become ubiquitous in modern statistics. Here, the problem of nonparametrically estimating a copula density is addressed. Arguably the most popular nonparametric density estimator, the kernel estimator is not suitable for the unit-square-supported copula densities, mainly because it is heavily affected by boundary bias issues. I...
Article
A multivariate extension of the bivariate class of Archimax copulas was recently proposed by Mesiar and Jágr (2013), who asked under which conditions it holds. This paper answers their question and provides a stochastic representation of multivariate Archimax copulas. A few basic properties of these copulas are explored, including their minimum and...
Article
Full-text available
La pratique du vote par procuration, de plus en plus répandue, reste mal connue. Tant la conception du vote comme acte personnel que les orientations dominantes des études de cet acte de citoyen conduisent à négliger cette procédure. Des universitaires apportent ici des éléments destinés à la construire comme objet d'étude sociologique. Qui a recou...
Article
In this article, we will get back on almost six years of experience, on a scholar blog. We get back on the initial motivations for starting an academic blog, and I will try to explain what makes the blog even more active, almost six years after, for an editor. We will also discuss interactions with microblogging - Twitter - activities. More than a...
Article
Full-text available
Loi des grands nombres, approximations et statistique asymptotique T out comme les assureurs, les statisticiens aiment les gros volumes de données. Pour les assureurs, les gros portefeuilles sont considérés comme moins incertains, et, pour les mêmes raisons, les statisticiens disposent, avec des grosses bases de données, d'estimateurs moins volatil...
Article
Full-text available
Selon le dictionnaire – en l'occurrence Romeuf [1956] –, un actuaire est un « mathématicien spécialisé dans le calcul des probabilités, dont les services sont utilisés, soit par des services financiers, pour des calculs d'amortissement, soit par des établissements dont l'activité comporte un risque calculable ou est basée sur la couverture d'un ris...
Article
Full-text available
S iméon Denis Poisson a travaillé sur le calcul des probabilités pendant vingt ans, de 1820 à 1840. Comme beaucoup de ses contemporains, il a effectué ses premiers travaux sur des problèmes de jeux, avec son Mémoire sur l'avantage du banquier au jeu de trente et quarante, lu à l'Académie des sciences le 13 mars 1820. Mais son premier travail import...
Article
Full-text available
In this paper, we investigate questions arising in Parsons & Geist (2012). Pseudo causal models connecting magnitudes and waiting times are considered, through generalized regression. We do use conditional model (magnitude given previous waiting time, and conversely) as an extension to joint distribution model described in Nikoloulopoulos & Karlis...
Chapter
Full-text available
Les agents face à une situation risquée ont besoin de comparer des positions ou des actions. Mais aujourd'hui, les établissements financiers (avec Bâle II) et les compagnies d'assurance (avec Solvabilité II) doivent surtout constituer des réserves pour faire face aux risques pris, c'est-à-dire qu'elle leur faut une quantification du risque pris. Il...
Chapter
Full-text available
Les copules sont devenus en quelques années un outils important pour modé-liser les risques multivariés (entre autres). Les copules permettent de « coupler » les lois marginales afin d'obtenir une loi multivariée, d'où le nom latin copula choisi par Abe Sklar en 1959 : « having worked out the basic properties of these functions, I wrote about them...
Article
In this paper, we investigate (and extend) Ripley's circumference method to correct bias of density estimation of edges (or frontiers) of regions. The idea of the method was theoretical and difficult to implement. We provide a simple technique -- based of properties of Gaussian kernels -- to efficiently compute weights to correct border bias on fro...
Article
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. Using martingale embedding techniques, we show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby ge...
Article
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. Using martingale embedding techniques, we show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby ge...
Article
The present paper develops a new theoretical framework for analyzing the decision to provide or buy insurance against the risk of natural catastrophes. In contrast with conventional models of insurance, the insurer has a non-zero probability of insolvency that depends on the distribution of the risks, the premium rate, and the amount of capital in...