
Arthur CharpentierUniversité du Québec à Montréal | UQAM · Department of Mathematics
Arthur Charpentier
PhD, Leuven
About
133
Publications
71,746
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,586
Citations
Introduction
Additional affiliations
September 2017 - November 2017
September 2011 - present
September 2007 - September 2011
Education
June 2002 - June 2006
June 1998 - June 1999
September 1996 - June 1999
Publications
Publications (133)
Many problems ask a question that can be formulated as a causal question: what would have happened if...? For example, would the person have had surgery if he or she had been Black? To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such...
With their intensive use of data to classify and price risk, insurers have often been confronted with data-related issues of fairness and discrimination. This paper provides a comparative review of discrimination issues raised by traditional statistics versus machine learning in the context of insurance. We first examine historical contestations of...
Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared...
Algorithmic Fairness and the explainability of potentially unfair outcomes are crucial for establishing trust and accountability of Artificial Intelligence systems in domains such as healthcare and policing. Though significant advances have been made in each of the fields separately, achieving explainability in fairness applications remains challen...
The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to hi...
Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared...
A fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to account for heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects panel model and propose a new model: expectile regression with fixed effects (ERFE). The ERFE model applies the wit...
Many problems ask a question that can be formulated as a causal question: "what would have happened if...?" For example, "would the person have had surgery if he or she had been Black?" To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (...
Since the beginning of their history, insurers have been known to use data to classify and price risks. As such, they were confronted early on with the problem of fairness and discrimination associated with data. This issue is becoming increasingly important with access to more granular and behavioural data, and is evolving to reflect current techn...
Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with interesting properties, that can be easily linked to economic theory. In this paper, we first show that modeling top incomes with Paret...
Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, the sum of fitted values can depart from the observed totals to a large extent. The possible lack of balance when models are trained by minimizing deviance outside the familiar GLM with canonical link setting has bee...
The fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to summarize the variable relationships in the presence of heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects model and propose a new model: expectile regression with fixed-effect...
Generalized estimating equations (GEE) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response – and therefore do not account for data heterogeneity. Here, we combine the GEE with the asymmetric least squares (expectile) regression to...
The economic consequences of drought episodes are increasingly important, although they are often difficult to apprehend in part because of the complexity of the underlying mechanisms. In this article, we will study one of the consequences of drought, namely the risk of subsidence (or more specifically clay shrinkage induced subsidence), for which...
The peer-to-peer (P2P) economy has been growing with the advent of the Internet, with well known brands such as Uber or Airbnb being examples thereof. In the insurance sector the approach is still in its infancy, but some companies have started to explore P2P-based collaborative insurance products (eg. Lemonade in the U.S. or Inspeer in France). Th...
Natural disasters offer a specific case study of the mix of public and private insurance. Indeed, the experience accumulated over the past decades has made it possible to transform poorly-known hazards like flood losses, long considered uninsurable, into risks that can be assessed with some precision. They exemplify, however, the affordability issu...
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent pick...
Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, there are nevertheless endless debates about the choice of the right loss function to be used to train the machine learning model, as well as about the appropriate metric to assess the performances of competing model...
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. We show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby generalizing Machina's result in Machina...
A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation...
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (value-at-risk, expected shortfall) or reinsurance premiums and related quantities (large claim index, return period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (poss...
Les sites qui proposent à leurs utilisateurs de reconstituer en ligne leur arbre généalogique fleurissent sur Internet. Cet article analyse le travail de collecte et de saisie effectué par ces utilisateurs et comment il pourrait être utilisé en démographie historique, afin de compléter la connaissance des générations du passé. Pour cela, les résult...
An extended SIR model, including several features of the recent COVID-19 outbreak, is considered: the infected and recovered individuals can either be detected or undetected and we also integrate an intensive care unit (ICU) capacity. We identify the optimal policy for controlling the epidemic dynamics using both lockdown and detection intervention...
The aim of this article is to assess the impact of Big Data technologies for insurance ratemaking, with a special focus on motor products.The first part shows how statistics and insurance mechanisms adopted the same aggregate viewpoint. It made visible regularities that were invisible at the individual level, further supporting the classificatory a...
Family history is usually seen as a significant factor insurance companies look at when applying for a life insurance policy. Where it is used, family history of cardiovascular diseases, death by cancer, or family history of high blood pressure and diabetes could result in higher premiums or no coverage at all. In this article, we use massive (hist...
We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the...
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent pick...
What future for predictive probabilities in insurance?
Since insurance policies are classical examples of contingency contracts, insurers have to regularly quantify uncertainty and calculate probabilities so as to offer premiums that are “fair” with respect to the obligations of both parties. Is it not high time to ask questions about insurance pra...
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (Value-at-Risk, Expected Shortfall) or reinsurance premiums and related quantities (Large Claim Index, Return Period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (poss...
A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation...
The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the...
This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pair of adjacent locations. Two types of agents are located at the nodes of this network. The buyers ch...
Recently, Alderson et al. (2009) mentioned that (strict) Scale-Free networks were rare, in real life. This might be related to the statement of Stumpf, Wiuf & May (2005), that sub-networks of scale-free networks are not scale-free. In the later, those sub-networks are asymptotically scale-free, but one should not forget about second-order deviation...
P r o f e s s e u r, U n i v e r s i t é d u Q u é b e c à M o n t r é a l La théorie des réseaux, ou des graphes, est née en 1735, à la suite des travaux de Leonhard Euler, qui essayait de trouver une promenade-à partir d'un point donné-qui fasse revenir à ce point en passant une fois et une seule par chacun des sept ponts de la ville de Königsber...
We propose an Aitken estimator for Gini regression. The suggested A-Gini estimator is proven to be a U-statistics. Monte Carlo simulations are provided to deal with heteroskedasticity and to make some comparisons between the generalized least squares and the Gini regression. A Gini-White test is proposed and shows that a better power is obtained co...
The well-known generalized estimating equations (GEE) is widely used to estimate the effect of the covariates on the mean of the response variable. We apply the GEE method using the asymmetric least-square regression (expectile) to analyze the longitudinal data. Expectile regression naturally extends the classical least squares method and has prope...
The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the...
Pourquoi les Grecs n'ont-ils pas inventé la théorie des probabilités ? L a question peut surprendre, mais quand on voit la connaissance des Grecs en géométrie, entre l'époque de Pythagore (550 avant notre ère) et celle d'Euclide (300 avant notre ère), on peut être surpris qu'aucune théorie du hasard n'ait été proposée. Il faut peut-être se souvenir...
Your name may feel unique to you – but chances are that someone, somewhere is called the same thing. Arthur Charpentier and Baptiste Coulmont estimate the proportion of shared identities in large social groups.
A u XIX e siècle, si plusieurs astronomes mesuraient la vitesse d'un même objet céleste, ils obtenaient (souvent) plusieurs mesures différentes. Pour savoir laquelle utiliser dans leurs calculs, l'idée d'utiliser « la méthode des moyennes » s'est rapidement imposée-comme le rappelle Stahl [2006], et surtout Sheynin [1973]-cette moyenne ayant une pr...
L'économétrie et l'apprentissage machine semblent avoir une finalité en commun: construire un modèle prédictif, pour une variable d'intérêt, à l'aide de variables explicatives (ou features). Pourtant, ces deux champs se sont développés en parallèle, créant ainsi deux cultures différentes, pour paraphraser Breiman (2001a). Le premier visait à constr...
Econometrics and machine learning seem to have one common goal: to construct a predictive model, for a variable of interest, using explanatory variables (or features). However, these two fields developed in parallel, thus creating two different cultures, to paraphrase Breiman (2001). The first was to build probabilistic models to describe economic...
This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups. The estimation is based on the generalization of the "birthday paradox problem". The main results is that, in societies such as France or the United States, identity colli...
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying a...
A u XIX e siècle, si plusieurs astronomes mesuraient la vitesse d'un même objet céleste, ils obtenaient (souvent) plusieurs mesures différentes. Pour savoir laquelle utiliser dans leurs calculs, l'idée d'utiliser « la méthode des moyennes » s'est rapidement imposée-comme le rappelle Stahl [2006], et surtout Sheynin [1973]-cette moyenne ayant une pr...
Risques n° 109 107 Les fondements de la segmentation tarifaire L es techniques de l'assurance reposent sur la loi des grands nombres, où des risques homogènes de nature et homogènes de valeur sont agrégés. Il est alors mathémati-quement essentiel que les risques soient classés par sous-catégories. Les assureurs évoqueront un principe de justice act...
Quantile and expectile regression models pertain to the estimation of unknown quantiles/expectiles of the cumulative distribution function of a dependent variable as a function of a set of covariates and a vector of regression coecients. Both approaches make no assumption on the shape of the distribution of the response variable, allowing for inves...
Prudence, prévention, précaution… et prévisions I l est souvent admis que la précaution se distingue de la prévention par l'absence d'identification des risques. La prévention pourrait être associée à la protection contre des risques identifiés, alors que la notion de précaution interroge sur les actions possibles face aux risques non encore identi...
La dynamique de reproduction des lapins et le nombre d'or F ibonacci est connu par les étudiants du monde entier pour avoir posé le problème suivant : « Partant d'un couple, combien de couples de lapins obtiendrons-nous après un nombre donné de mois sachant que chaque couple produit chaque mois un nouveau couple, lequel ne devient productif qu'aprè...
Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that...
In this paper, we investigate the impact of the accident reporting strategy of drivers, within a Bonus-Malus system. We exhibit the induced modification of the corresponding class level transition matrix and derive the optimal reporting strategy for rational drivers. The hunger for bonuses induces optimal thresholds under which, drivers do not clai...
Traditionally, actuaries have used run-off triangles to estimate reserve ("macro" models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with "macro" and "micro" models. We study theoretical properties of econometric...
Le rôle d’un actuaire dans une entreprise d’assurance est assez souvent d’estimer la probabilité qu’un événement survienne, ou ses possibles conséquences financières, et est également fonction de variables dites « explicatives ». On voit en effet que certaines variables sont « statistiquement corrélées » avec la survenance d’un accident dans l’anné...
Au printemps dernier, l’économiste Paul Romer lançait une discussion passionnante autour de la notion de « mathiness ». Son essai très remarqué a réactivé des débats plus larges sur la place du formalisme mathématique en économie, les stratégies de modélisation (réalisme des hypothèses, etc.) et les liens entre théorie et travail empirique. Ces déb...
In this paper, we investigate a technique inspired by Ripley’s circumference method to correct bias of density estimation of edges (or frontiers) of regions. The idea of the method was theoretical and difficult to implement. We provide a simple technique – based of properties of Gaussian kernels – to efficiently compute weights to correct border bi...
Cet article présente un survol les techniques usuelles de modélisation de séries nancières multiples. Pus spéciquement, on cherchera a obtenir une extention multivariée des modèles GARCH. Dans un premier temps, nous verrons comment modéliser la dynamique de la matrice de corrélation (conditionnelle), puis nous verrons comment généraliser cette appr...
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this article, we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying...
Standard kernel density estimation methods are very often used in practice to estimate density functions. It works well in numerous cases. However, it is known not to work so well with skewed, multimodal and heavy-tailed distributions. Such features are usual with income distributions, defined over the positive support. In this paper, we show that...
This paper provides a meta-analysis of 1651 point estimates of Feldstein and Horioka saving
retention coefficient from 49 peer-reviewed papers published over three decades. We get two
main results. First, correcting for publication bias, we find a consistent underlying coefficient
lying between 0.56 and 0.67 for studies using the original paper....
S iméon Denis Poisson a travaillé sur le calcul des probabilités pendant vingt ans, de 1820 à 1840. Comme beaucoup de ses contemporains, il a effectué ses premiers travaux sur des problèmes de jeux, avec son Mémoire sur l'avantage du banquier au jeu de trente et quarante, lu à l'Académie des sciences le 13 mars 1820. Mais son premier travail import...
L’assurance repose fondamentalement sur l’idée que la mutualisation des risques entre des assurés est possible. Cette mutualisation, qui peut être vue comme une relecture actuarielle de la loi des grands nombres, n’a de sens qu’au sein d’une population de risques « homogènes » [Charpentier, 2011]. Cette condition (actuarielle) impose aux assureurs...
Recent protests have fuelled deliberations about the extent to which social media ignites popular uprisings. In this paper we use time-series data of Twitter, Facebook, and onsite protests to assess the Granger-causality between social media streams and onsite developments at the Indignados, Occupy, and Brazilian Vinegar protests. After applying a...
Copula modelling has become ubiquitous in modern statistics. Here, the
problem of nonparametrically estimating a copula density is addressed. Arguably
the most popular nonparametric density estimator, the kernel estimator is not
suitable for the unit-square-supported copula densities, mainly because it is
heavily affected by boundary bias issues. I...
A multivariate extension of the bivariate class of Archimax copulas was recently proposed by Mesiar and Jágr (2013), who asked under which conditions it holds. This paper answers their question and provides a stochastic representation of multivariate Archimax copulas. A few basic properties of these copulas are explored, including their minimum and...
La pratique du vote par procuration, de plus en plus répandue, reste mal connue. Tant la conception du vote comme acte personnel que les orientations dominantes des études de cet acte de citoyen conduisent à négliger cette procédure. Des universitaires apportent ici des éléments destinés à la construire comme objet d'étude sociologique. Qui a recou...
In this article, we will get back on almost six years of experience, on a scholar blog. We get back on the initial motivations for starting an academic blog, and I will try to explain what makes the blog even more active, almost six years after, for an editor. We will also discuss interactions with microblogging - Twitter - activities. More than a...
Loi des grands nombres, approximations et statistique asymptotique T out comme les assureurs, les statisticiens aiment les gros volumes de données. Pour les assureurs, les gros portefeuilles sont considérés comme moins incertains, et, pour les mêmes raisons, les statisticiens disposent, avec des grosses bases de données, d'estimateurs moins volatil...
Selon le dictionnaire – en l'occurrence Romeuf [1956] –, un actuaire est un « mathématicien spécialisé dans le calcul des probabilités, dont les services sont utilisés, soit par des services financiers, pour des calculs d'amortissement, soit par des établissements dont l'activité comporte un risque calculable ou est basée sur la couverture d'un ris...
S iméon Denis Poisson a travaillé sur le calcul des probabilités pendant vingt ans, de 1820 à 1840. Comme beaucoup de ses contemporains, il a effectué ses premiers travaux sur des problèmes de jeux, avec son Mémoire sur l'avantage du banquier au jeu de trente et quarante, lu à l'Académie des sciences le 13 mars 1820. Mais son premier travail import...
In this paper, we investigate questions arising in Parsons & Geist (2012). Pseudo causal models connecting magnitudes and waiting times are considered, through generalized regression. We do use conditional model (magnitude given previous waiting time, and conversely) as an extension to joint distribution model described in Nikoloulopoulos & Karlis...
Les agents face à une situation risquée ont besoin de comparer des positions ou des actions. Mais aujourd'hui, les établissements financiers (avec Bâle II) et les compagnies d'assurance (avec Solvabilité II) doivent surtout constituer des réserves pour faire face aux risques pris, c'est-à-dire qu'elle leur faut une quantification du risque pris. Il...
Les copules sont devenus en quelques années un outils important pour modé-liser les risques multivariés (entre autres). Les copules permettent de « coupler » les lois marginales afin d'obtenir une loi multivariée, d'où le nom latin copula choisi par Abe Sklar en 1959 : « having worked out the basic properties of these functions, I wrote about them...
In this paper, we investigate (and extend) Ripley's circumference method to correct bias of density estimation of edges (or frontiers) of regions. The idea of the method was theoretical and difficult to implement. We provide a simple technique -- based of properties of Gaussian kernels -- to efficiently compute weights to correct border bias on fro...
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. Using martingale embedding techniques, we show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby ge...
We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. Using martingale embedding techniques, we show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby ge...
The present paper develops a new theoretical framework for analyzing the decision to provide or buy insurance against the risk of natural catastrophes. In contrast with conventional models of insurance, the insurer has a non-zero probability of insolvency that depends on the distribution of the risks, the premium rate, and the amount of capital in...