
Christine M Thomas-AgnanToulouse School of Economics | TSE · Research Group for Decision Mathematics
Christine M Thomas-Agnan
PhD Mathematics
About
101
Publications
13,191
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,007
Citations
Introduction
Christine M Thomas-Agnan currently works at Toulouse School of Economics. Christine does research in Statistics and Econometrics. She works on statistical techniques for spatial data as well as for compositional data. The applications are in marketing as well as in the domain of health.
Additional affiliations
September 1989 - present
September 1994 - present
Education
September 1984 - July 1987
Publications
Publications (101)
Spatial autoregressive models have been adapted to model data with both a geographic and a compositional nature. Interpretation of parameters in such a model is intricate. Indeed, when the model involves a spatial lag of the dependent variable, this interpretation must focus on the so-called impacts rather than on parameters and when moreover the d...
We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes. In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobi...
We extend the impact decomposition proposed by LeSage and Thomas‐Agnan (2015) in the spatial interaction model to a more general framework, where the sets of origins and destinations can be different, and where the relevant attributes characterizing the origins do not coincide with those of the destinations. These extensions result in three flow da...
We consider models for network-indexed multivariate data, also known as graph signals, involving a dependence between variables as well as across graph nodes. The dependence across nodes is typically established through the entries of the Laplacian matrix by imposing a distribution that relates the graph signal from one node to the next. Based on s...
Invariant coordinate (or component) selection (ICS) is a multivariate statistical method introduced by Tyler et al. (J R Stat Soc Ser B (Stat Methodol) 71(3):549–592, 2009) and based on the simultaneous diagonalization of two scatter matrices. A model-based approach of ICS, called invariant coordinate analysis, has already been adapted for composit...
Econometric land use models study determinants of land use shares of different classes: “agriculture”, “forest”, “urban” and “other” for example. Land use shares have a compositional nature as well as an important spatial dimension. We compare two compositional regression models with a spatial autoregressive nature in the framework of land use. We...
In the framework of Compositional Data Analysis, vectors carrying relative information, also called compositional vectors, can appear in regression models either as dependent or as explanatory variables. In some situations, they can be on both sides of the regression equation. Measuring the marginal impacts of covariates in these types of models is...
The vote shares by party on a given subdivision of a territory form a vector called composition (mathematically, a vector belonging to a simplex). It is interesting to model these shares and study the impact of the characteristics of the territorial units on the outcome of the elections. In the political economy literature, few regression models ar...
In an election, the vote shares by party for a given subdivision of a territory form a compositional vector (positive components adding up to 1). Conventional multiple linear regression models are not adapted to explain this composition due to the constraint on the sum of the components and the potential spatial autocorrelation across territorial u...
To model multivariate, possibly heavy-tailed data, we compare the multivariate normal model (N) with two versions of the multivariate Student model: the independent multivariate Student (IT) and the uncorrelated multivariate Student (UT). After recalling some facts about these distributions and models, known but scattered in the literature, we prov...
We are interested in modeling the impact of media investments on automobile manufacturer's market shares. Regression models have been developed for the case where the dependent variable is a vector of shares. Some of them, from the marketing literature, are easy to interpret but quite simple (Model A). Alternative models, from the compositional dat...
This method decomposes the between-year change in various indicators related to the outcome distribution (mean, median, quantiles…) into the effect due to between-year change in the conditional distribution of the outcome given sociodemographic characteristics, or “structure effect”, and the effect due to the differences in sociodemographic charact...
Assessing the nonlinearity of the calorie-income relationship is a crucial issue when evaluating policies aimed at fighting against malnutrition. A natural choice would be to adopt a fully nonparametric specification of the relationship in order to let the data reveal its nonlinearity. But, we would be faced with the problem of the curse of dimensi...
Vietnam launched the national Expanded Program on Immunization in 1981. Since then, this program has contributed significantly to the improvement of child health and to the reduction of child mortality rate. Despite of the fact that the coverage of the national EPI keeps expanding, the number of children who complied with the recommended immunizati...
This paper contributes to the analysis of the impact of socioeconomic factors, like food expenditure level and urbanization, on diet patterns in Vietnam, from 2004 to 2014. Contrary to the existing literature, we focus on the diet balance in terms of macronutrients consumption (protein, fat and carbohydrate) and we take into account the fact that t...
Vietnam is undergoing a nutritional transition like many middle-income countries. This paper proposes to highlight the socio-demographic drivers of this transition over the period 2004-2014. We implement a method of decomposition of between-year differences in economic outcomes recently proposed in the literature. This method allows decomposing the...
Vietnam has recorded impressive achievements in growth performance after the Economic reforms (1984). There ia an evolution of the nutrition transition in Vietnam from 2004 - 2014 a shift towards consuming more protein and fat. How do socio-demographic characteristics of Vietnamese households drive this nutrition transition? We apply the decomposit...
When the aim is to model market shares, the marketing literature proposes some regression models which can be qualified as attraction models. They are generally derived from an aggregated version of the multinomial logit model. But aggregated multinomial logit models (MNL) and the so-called generalized multiplicative competitive interaction models...
This paper revisits the issue of estimating the relationship between calorie intake and income, and presents and compare estimates of this relationship for China and Vietnam. Semiparametric generalized additive models are estimated and their performances are compared to the performance of the classical double log model using the revealed performanc...
Policies aimed at reducing starvation and redressing nutritional deciencies remain among the most widely accepted policies in the world. These policies can take many dierent forms, from subsidized prices of basic foodstus to cash transfers, and their eectiveness depends on the existence of a sensitivity of food demand to income variation and its ma...
In this paper, we propose models that allow to predict land use (urban, agriculture, forests, natural grasslands and soil) at the points of the Teruti-Lucas survey from easily accessible covariates. Our approach involves two steps: first we model land use at the Teruti-Lucas point level and second, we propose a method to aggregate land use on regul...
We address the problem of prediction in a classical spatial simultaneous au- toregressive model. The optimality of prediction formulas in non-spatial regression models is not immediately transposable to the framework of all spatial models. In the geostatistical literature, much attention has been devoted to this topic, with the development of the B...
Recent research on calorie intake and income relationship abounds with parametric models but usually gives inconclusive results. Our paper aims at contributing to this literature by using recent advances in the estimation of generalized additive models with penalized spline regression smoothing (GAM). These semi-parametric models enable mixing para...
Media investments in the automobile market represent a very large amount of money allocated to different channels : TV, radio, press, outdoor, digital and cinema. Car manufacturers want to analyze the impact of media investments on their sales in order to optimize the amount spent and the allocation across channels. This cannot be done without taki...
We consider the problem of land use prediction at different spatial scales using point level data such as the Teruti-Lucas (T-L hereafter¹ survey and some explanatory variables. We analyze the components of the prediction error using a synthetic data set constructed from the Teruti-Lucas points in the Midi-Pyrénées region and a five categories land...
We explore the estimation of origin-destination (OD), city-pair, air passenger flows. Our dataset contains 279 cities, worldwide, over 2010-2012. Allowing for two gravity model specifications (log-normal and Poisson), we compare non-spatial and spatial models. We are the first to apply spatial econometric flow models and eigenfunction spatial filte...
The analysis of socioeconomic data often implies the combination of data bases originating from dierent administrative sources so that data have been collected on several separate partitions of the zone of interest into administrative units. It is therefore necessary to reallocate the data from the source spatial units to the target spatial units....
We model the unidentified aerial phenomena observed in France during the last 60 years as a spatial point pattern. We use some public information such as population density, rate of moisture or presence of airports to model the intensity of the unidentified aerial phenomena. Spatial exploratory data analysis is a first approach to appreciate the li...
The combination of several socio-economic data bases originating from
different administrative sources collected on several different partitions of a
geographic zone of interest into administrative units induces the so called
areal interpolation problem. This problem is that of allocating the data from a
set of source spatial units to a set of targ...
Cet ouvrage est le fruit de la collaboration entre spécialistes parmi les plus réputés : Eva Cantoni (Université de Genève), Christophe Croux (Katholieke Universiteit Leuven), Catherine Dehon (Université Libre de Bruxelles), Gentiane Haesbroeck (Université de Liège) et Anne Ruiz-Gazen (Université de Toulouse Capitole), réunis à l?occasion des 15e J...
Spatial interaction or gravity models have been used in regional science to model flows that take many forms, for example, population migration, commodity flows, traffic flows, and knowledge flows, all of which reflect movements between origin and destination regions. This chapter focuses on spatial autoregressive extensions to the conventional lea...
ouvrage collectif, issu des journées d'étude de la SFdS
The Mahalanobis distance between pairs of multivariate observations is used as a measure of similarity between the observations. The theoretical distribution is derived, and the result is used for judging on the degree of isolation of an observation. In case of spatially dependent data where spatial coordinates are available, different exploratory...
Spatial interaction or gravity models have been used to model flows that take many forms, for example population migration, commodity flows, traffic flows, all of which reflect movements between origin and destination regions. We focus on how to interpret estimates from spatial autoregressive extensions to the conventional regression-based gravity...
We address the question of measuring and testing industrial spatial concentration based on micro-geographic data with distance-based methods. We discuss the basic requirements for such measures and we propose four additional requirements. We also discuss the null assumptions classically used for testing aggregation of a particular sector and propos...
Cet ouvrage est consacré à un domaine de recherche porteur de nombreux développements, tout particulièrement depuis une quinzaine d'années.
L'une des innovations des modèles à variables latentes est de prendre en compte des variables inobservables, causes de phénomènes qui, eux, peuvent s'observer directement.
Cette formalisation permet de fédérer...
Ce Document d'analyse PSDR s'intéresse au choix de fermeture/ouverture de classes, à partir d'un critère de distance peu employé : le temps de trajet entre le domicile de l'élève et son école. L'originalité de l'étude repose sur l'utilisation d'un modèle mathématique prenant en compte la variabilité stochastique de la position géographique des élèv...
In this paper we deal with the problem of the creation or closure of school classes from the perspective of a distance indicator seldom found in the literature: the time needed to travel by road from the family home to the training establishment. Unlike classical deterministic schemes, the originality of this work lies in the use of simulations whi...
The problem of finding an optimal location frequently occurs in geomarketing, economics and other fields: positioning a new branch of a bank, a supermarket, a fire station, a plant, designing a traffic network, etc. The optimal location of the source facility is the argument-minimum of an optimization problem parametrized by some characteristics of...
In this paper we introduce a nonparametric approach for the estimation of the covariance function of a stationary stochastic
process X
t
indexed by $$t \in {\mathbb{R}}^+.$$ The data consist of a finite number of observations of the process at irregularly spaced time points and the aim is to estimate
the covariance at any lag point without paramet...
We present GeoXp, an R package implementing interactive graphics for exploratory spatial data analysis. We use a data set concerning public schools of the French Midi- Pyr ÃÂen ÃÂees region to illustrate the use of these exploratory techniques based on the coupling between a statistical graph and a map. Besides elementary plots like boxplots, histo...
The aim of the paper is to present a statistical methodology allowing a meaningful comparison of the production performance of firms without resorting to the usual concept of production frontier. We introduce an efficiency measure based on a nonstandard conditional distribution and propose a two-stage estimation procedure with a smoothing step foll...
This paper proposes a nonparametric test of the non-convexity of a smooth regression function based on least squares or hybrid splines. By a simple formulation of the convexity hypothesis in the class of all polynomial cubic splines, we build a test which has an asymptotic size equal to the nominal level. It is shown that the test is consistent and...
People can be located according to their residence, their place of work, their doctor’s office, their pharmacy, and so forth.
It is sometimes of interest to look for patterns in people’s locations in relation to their behaviours.
In this article, we are particularly interested in the cost of patients’ prescriptions per doctor consultation. In part...
In frontier analysis, most of the nonparametric approaches (free disposal hull FDH , data envelopment analysis DEA ) are based on envelopment ideas, and their statistical theory is now mostly available. However, by construction, they are very sensitive to outliers. Recently, a robust nonparametric estimator has been suggested by Cazals, Florens, an...
Markov random fields (MRFs) express spatial dependence through conditional distributions, although their stochastic behavior is defined by their joint distribution. These joint distributions are typically difficult to obtain in closed form, the problem being a normalizing constant that is a function of unknown parameters. The Gaussian MRF (or condi...
1 Theory.- 2 RKHS AND STOCHASTIC PROCESSES.- 3 Nonparametric Curve Estimation.- 4 Measures And Random Measures.- 5 Miscellaneous Applications.- 6 Computational Aspects.- 7 A Collection of Examples.- to Sobolev spaces.- A.l Schwartz-distributions or generalized functions.- A.1.1 Spaces and their topology.- A.1.2 Weak-derivative or derivative in the...
Since its foundation by Borel and Lebesgue around the year 1900 the modern theory of measure, generalizing the basic notions of length, area and volume, has become one of the major fields in Pure and Applied Mathematics. In all human activities one collects measurements subject to variability and leading to the classical concepts of Probability and...
In Chapter 1, we have studied the relationships between reproducing kernels and positive definite functions. In this chapter, the central result due to Loève is that the class of covariance functions of second order-stochastic processes coincide with the class of positive definite functions. This link has been used to translate some problems relate...
Reproducing kernels are often found in nonpararnetric curve estimation in connection with the use of spline functions, which were popularized by Wahba in the statistics literature in the 1970s. A brief introduction to the theory of splines is presented in Section 2. Sections 4 and 5 are devoted to the use of splines in nonparametric estimation of d...
New reproducing kernels with interesting applications continually appear in the literature. In Section 4 of the present chapter we list major examples for which the kernel and the associated norm and space are explicitly described. They can be used to illustrate aspects of the theory or to practically implement some of the tools presented in the bo...
A Reproducing Kernel Hilbert Space (RKHS) is first of all a Hilbert space, that is, the most natural extension of the mathematical model for the actual space where everyday life takes place (the Euclidean space ℝ3). When studying elements of some abstract set S it is convenient to consider them as elements of some other set S′ on which is already d...
The theory of reproducing kernel Hilbert spaces interacts with so many subjects in Probability and Mathematical Statistics that it is impossible to deal with all of them in this book. Besides topics that we were willing to develop and to which a chapter is devoted we have selected a few themes gathered in the present chapter.
In many applications, the choice of Hilbert space and norm is governed by context related modeling reasons and one has to face the problem of computing the corresponding reproducing kernel. Symmetrically, it is of interest to characterize the Hilbert space H
K
associated with a given kernel K by the Moore-Aronszajn theorem and in particular to give...
Previous empirical studies on the effect of age on productivity and wages find contradicting results. Some studies find that if workers grow older there is an increasing gap between productivity and wages, i.e. wages increase with age while productivity does not or does not increase at the same pace. However, other studies find no evidence of such...
Unemployment rates vary widely at the sub-regional level. We seek to explain why such variation occurs, using data for 174 districts in the Midi-Pyrénées region of France for 1990–1991. A set of explanatory variables is derived from theory and the voluminous literature. The best model includes a correction for spatially autocorrelated errors. Unemp...
We consider a nonparametric random design regression model in which the response variable is possibly right censored. The aim of this paper is to estimate the conditional distribution function and the conditional alpha-quantile of the response variable. We restrict attention to the case where the response variable as well as the explanatory variabl...
GEOX is a computer package of Splus and Matlab routines implementing interactive graphics methods for exploring spatial data. We analyse a large data basis from the regional public health insurance agency concerning physicians'' activity in the Midi-Pyrénées region. We evaluate in particular heterogeneity and outliers in the density of physicians,...
The aim of this paper is to prove the equivalence between the regression quantile under monotonicity constraint and the Min-Max formula introduced by Casady and Cryer (1976, Ann. Math. Statist. 4 (3), 532-541). The proof of this result uses an original probability density which does not appear in classical books.
Shape restrictions on a functional parameter, such as positivity, monotonicity, and convexity, arise in a variety of statistical models. In econometric models, for instance, a cost function must be homogeneous of degree one, nondecreasing, and concave in the price of inputs in order to be consistent with the economic theory of production. We presen...
Nous proposons un modele approprie de l'evolution deu cours de l'action dans un marche financier ou le prix d'actif a un instant peut influencer a long terme le dynamique du cours. Cet effet de longue memoire ne peut pas etre pris en compte par le modele usuel de Black et Scholes. On precise le bruit par un processus de type ARIMA fractionnaire et...
Constrained smoothing splines are discussed under order restrictions on the shape of the function m. We consider shape constraints of the type m(r)≥ 0, i.e. positivity, monotonicity, convexity, .... (Here for an integer r≥ 0, m(r) denotes the rth derivative of m.) The paper contains three results: (1) constrained smoothing splines achieve optimal r...
Unemployment rates vary widely at the sub-regional level. We seek to explain why such variation occurs, using data for 174 districts in the Midi-Pyrenees region of France for 1990-91. A set of explanatory variables is derived from theory and the voluminous literature.
For an open subset of , an integer,m, and a positive real parameter , the Sobolev spacesH
m
() equipped with the norms: u2=u(t)2dt+(1/2m
u
(m)(t)2 constitute a family of reproducing kernel Hilbert spaces. When is an open interval of the real line, we describe the computation of their reproducing kernels. We derive explicit formulas for these kerne...
This paper proposes to use a semi parametric regression method, named Sliced Inverse Regression (SIR hereafter), to analyse ambulatory blood pressure monitoring data.
Sliced inverse regression is a dimension reduction technique for exploring non-linear relationships between an output variable and a vector of input variables. Motivated by a data set of income distributions and economic indicators of french cities, we adress the problem of modelling a family of empirical distribution functions in terms of some cov...
Egalement paru dans : INRA-ESR Toulouse Série D ; 93-06D
In the problem of nonparametric regression for a fixed design model, we may want to use additional information about the shape of the regression function, when available, to improve the estimation. The regression function may, for example, be convex or monotone or more generally belong to a cone in some functional space. We devise a method for impr...
Some relationships have been established between unbiased linear predictors of processes, in signal and noise models, minimizing the predictive mean square error and some smoothing spline functions. We construct a new family of multidimensional splines adapted to the prediction of locally homogeneous random fields, whose "$m$-spectral measure" (to...
An extension of the Theory of Polynomial Smoothing Splines is developed for the statistical problem of estimating a curve based on incomplete and noisy observations. The smoothness of a curve is measured by a quadratic function of its FOURIER transform in the space of tempered distributions. These splines, called a-splines, bring a new con¬nection...
This paper proposes a nonparametric test of the non-convexity of a smooth regression function based on least squares or hybrid splines. By a simple formulation of the convexity hypothesis in the class of all polynomial cubic splines, we build a test which has an asymptotic size equal to the nominal level. It is shown that the test is consistent and...
At User !2006, we introduced GeoXp, an R package implementing interactive graphics for exploratory spatial data analysis. Besides elementary plots like boxplots, histograms or simple scatterplots, GeoXp also couples maps with Moran scatterplots, variogram clouds, Lorenz curves, etc. In order to make the most of the multidimensionality of the data,...
Nous proposons une théorie unifiée pour la mesure de la concentration spatiale de l'implantation économique à partir de données micro-géographiques. La théorie de l'économie géographique de Krugman souligne que la production a tendance à se concentrer dans quelques pays, régions, villes qui sont densément peuplés avec de hauts revenus. Les études e...
We propose a unied framework for dening measures of industrial concentration based on micro-geographic data. These encompass the Duranton-Overman and the Marcon-Puech indices. We discuss the basic requirements for such measures introduded by Duranton and Overman (2005) and we propose ve additional requirements. We describe several types of concentr...
Network
Cited