• Home
  • Juan Jose Egozcue
Juan Jose Egozcue

Juan Jose Egozcue
Polytechnic University of Catalonia (Universitat Politècnica de Catalunya) · Department of Civil and Environmental Engineering (DECA)

Profesor Emérito (Emeritus Professor)

About

273
Publications
131,413
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,259
Citations
Introduction
He studied Physics, at the University of Barcelona (Spain). He obtained his PhD in 1982. In 1978 he got lecturer in the Civil Engineering School (U. Politécnica de Cataluña, UPC, Barcelona, Spain), teaching subjects on Applied Mathematics and Statistics. Full Professor in 1989, at the UPC. Present main research activity: Bayesian methods for natural hazard assessment; and analysis of compositional data, with special emphasis in the geometry of the sample space.
Additional affiliations
September 2015 - August 2016
Polytechnic University of Catalonia (Universitat Politècnica de Catalunya)
Position
  • Catedrático (Full Professor)
October 1977 - January 2016
Universitat Politècnica de Catalunya
Position
  • Professor (Full)
Description
  • Compositional data analysis. Research focussed on the sample space (the simplex) and its algebraic-geometric structure. Attention is also paid to applications in geosciences, bio-omics and economics.

Publications

Publications (273)
Article
Potentially Toxic Elements (PTEs) are contaminants with high toxicity and complex geochemical behaviour and, therefore, high PTEs contents in soil may affect ecosystems and/or human health. However, before addressing the measurement of soil pollution, it is necessary to understand what is meant by pollution-free soil. Often, this background, or pol...
Article
Mineral reserve evaluation is based on compositional data, reporting in units such as percentages, mg/kg or ppm. The nature of such data implies that the values cannot vary independently and geostatistical modeling of their raw form could be misleading. In this study we propose a compositional approach —based on orthonormal isometric logratio (olr)...
Presentation
Full-text available
The methodology is explained in Egozcue, J. J., V. Pawlowsky-Glahn, and A. Buccianti, Distances to compositional equilibrium. J Geochem Explor 227, 106793, (2021).
Conference Paper
Full-text available
Equilibrium of components is an important issue. Well known examples are the Hardy-Weinberg law in population dynamics and chemical equilibrium regulated by stoichiometry in mineralogy. These equilibria appear as a restriction in the sample space of compositional data. Hardy-Weinberg equilibrium corresponds to a linear equilibrium, while olivine cr...
Article
Geochemical samples can be restricted to be on a locus in the sample space. When this occurs there is a chemical equilibrium of some kind. Typical examples are chemical equilibrium regulated by the mass action law or stoichiometry given by the structure of crystals. These equilibria can correspond to linear restrictions in the simplex, for which th...
Article
It often occurs in practice that it is sensible to give different weights to the variables involved in a multivariate data analysis—and the same holds for compositional data as multivariate observations carrying relative information. It can be convenient to apply weights to better accommodate differences in the quality of the measurements, the occu...
Article
Full-text available
Chronic kidney disease (CKD), a collective term for many causes of progressive renal failure, is increasing worldwide due to ageing, obesity and diabetes. However, these factors cannot explain the many environmental clusters of renal disease that are known to occur globally. This study uses data from the UK Renal Registry (UKRR) including CKD of un...
Chapter
A large number of families of distributions are available to model multivariate real vectors. On the contrary, for the simplex sample space, we have only a limited number of families arising through the generalization of the Dirichlet family or the logratio normal family. This chapter tries to summarize those models and some generalizations with a...
Chapter
Full-text available
The occurrence of environmental clusters of Chronic Kidney Disease of uncertain aetiology (CKDu), where there is no known cause for the onset of kidney dysfunction, is a concern globally. Waterborne exposure pathways in the environment may result in indirect or direct ingestion of trace elements with potential health risks. This research examines t...
Chapter
This study of square matrices with positive entries is motivated by a previous contribution on exchange rates matrices. The sample space of these matrices is endowed with a group operation, the componentwise product or Hadamard product. Also an inner product, identified with the ordinary inner product of the componentwise logarithm of the matrices,...
Chapter
Triangular arbitrage in the foreign exchange market of a group of countries exists whenever it is possible to make profit by buying and selling their currencies using the spot exchange rates. Working in the framework of the Aitchison geometry, and using characterizations of the absence of triangular arbitrage, we present two applications to the cur...
Poster
Full-text available
Among the remarkable merits of Florence Nightingale (1820-1910) the introduction of her chart presenting casualty data of Crimea War (1854-1856) is notorious. Her intention was to show to politicians and public in general that most deaths were avoidable taking measures of hygiene and care, in front of deaths caused by war wounds or other causes. Re...
Presentation
Full-text available
Among the remarkable merits of Florence Nightingale (1820--1910) the introduction of her chart presenting casualty data of Crimea War (1854-1856) is notorious. Her intention was to show to politicians and public in general that most deaths were avoidable taking measures of hygiene and care, in front of deaths caused by war wounds or other causes. R...
Article
We propose a new method to estimate the horizontal-to-vertical (H/V) spectral ratio using microtremor measurements. The technique is based on modeling the H/V transfer function by means of an AutoRegressive Moving Average (ARMA) filter. As compared with the conventional, Fourier-based spectra processing routines, this method is efficient in extract...
Article
Full-text available
download: https://academic.oup.com/nargab/article/2/4/lqaa094/5996081?login=true Measurements in sequencing studies are mostly based on counts. There is a lack of theoretical developments for the analysis and modelling of this type of data. Some thoughts in this direction are presented, which might serve as a seed. The main issues addressed are th...
Article
Problems with compositional data, like spurious correlation and negative bias, are well known in the Geosciences. Not so well known is the fact that the same problems appear when dealing with regionalized compositions. Here, these problems are illustrated, and a solution, based on the principle of working in coordinates using orthonormal logratio r...
Article
Compositional data carry relative information. Hence, their statistical analysis has to be performed on coordinates with respect to a log-ratio basis. Frequently, the modeler is required to back-transform the estimates obtained with the modeling to have them in the original units such as euros, kg or mg/liter. Approaches for recovering original uni...
Article
In functional data analysis some region(s) of the domain of the functions can be of more interest than others due to the quality of measurement, relative scale of the domain, or simply due to some external reason (e.g., interest of stakeholders). Weighting the domain is of interest particularly with probability density functions (PDFs), as derived...
Article
Full-text available
This research uses an urban soil geochemistry database of elemental concentration to examine the potential relationship between Standardised Incidence Rates (SIRs) of Chronic Kidney Disease (CKD) of uncertain aetiology (CKDu), and cumulative low level geogenic and diffuse anthropogenic contamination of soils with PTEs. A compositional data analysis...
Article
Full-text available
Background: Fecal microbiota transplantation (FMT) has been recently approved by FDA for the treatment of refractory recurrent clostridial colitis (rCDI). Success of FTM in treatment of rCDI led to a number of studies investigating the effectiveness of its application in the other gastrointestinal diseases. However, in the majority of studies the...
Article
Full-text available
We provide some characterizations of the absence of triangular arbitrage in the spot exchange rates of a group of countries. When the matrix of exchange rates of the group does not fulfill the conditions given in those characterizations, we provide a measure of distance to the space of matrices of exchange rates that are triangular arbitrage‐free....
Preprint
Full-text available
Probability density functions (PDFs) can be understood as continuous compositions by the theory of Bayes spaces. The origin of a Bayes space is determined by a given reference measure. This can be easily changed through the well-known chain rule which has an impact on the geometry of the Bayes space. This work provides a mathematical framework for...
Presentation
Full-text available
Compositions, closed or not, can always be represented in the simplex, and no compositional part can vary independently of the others. Correlation coefficients (Pearson, Spearman, …) of raw parts are subcompositionally incoherent and spurious. Log-ratio measures, such as correlations of clr components, pivot coordinates or symmetric balances, are a...
Article
The log-ratio approach to compositional data (CoDa) analysis has now entered a mature phase. The principles and statistical tools introduced by J. Aitchison in the eighties have proven successful in solving a number of applied problems. The algebraic–geometric structure of the sample space, tailored to those principles, was developed at the beginni...
Preprint
Background: Fecal microbiota transplantation (FMT) is now approved for the treatment of refractory recurrent clostridial colitis, but a number of studies are ongoing in inflammatory bowel diseases, i.e., Crohn's disease, nonspecific ulcerative colitis, and in other autoimmune conditions. In most cases, the effects of FMT are evaluated on patients w...
Conference Paper
Full-text available
- A compositional and a noncompositional approach are followed to interpolate regionalized compositional data, coming from chemical analysis of 1422 soil samples from Sungun copper-molybdenum deposit in Iran. - With the compositional approach, a sequential binary partition (SBP) is used to produce ilr-coordinates known as balances. These nonre-str...
Poster
Full-text available
A compositional and a noncompositional approach are followed to interpolate regionalized compositional data, coming from chemical analysis of 1422 soil samples from Sungun copper-molybdenum deposit in Iran. With the compositional approach, a sequential binary partition (SBP) is used to produce ilr-coordinates known as balances. These nonrestricted...
Poster
Full-text available
Existence of triangular arbitrage (TA) in the spot exchange rates market means the possibility of achieving strictly positive benefit by buying and selling currencies of different countries. In this work we provide some characterizations of the absence of TA in the spot exchange rates of a group of countries. When the matrix of exchange rates (MER)...
Poster
Full-text available
One aim of the statistical analysis of Microbiome data is the detection of relationships between the microbiota and some disease or condition. One tool designed to achieve this goal is an algorithm, known as SELBAL. It is based on the assumption that the sample space of compositional data is the simplex endowed with the Aitchison geometry. SELBAL s...
Article
Full-text available
The statistical techniques based on compositional data are applied to investigate the evolution of the traffic share of the container throughput in a multi-port system. Compositional vectors are those which contain relative information of parts of some whole. The application of conventional statistical techniques to compositional data may lead to e...
Article
Full-text available
This paper studies the relative evaluation of young people and the possible benefits associated with three methods of avoiding sexually transmitted infections/AIDS and/or unwanted pregnancies (condoms, contraceptive pills, morning-after pills). A survey evaluating these three methods, with respect to ten different items, was given to 145 undergradu...
Presentation
Characterisation of no triangular arbitrage exchange rate matrices. Obtention of no triangular arbitrage by projection from general matrices of exchange rates.
Conference Paper
The yearly frequency of classes of ocean wave storms in an off shore site is assumed to be a multinomial sample. The multinomial probabilities are assumed to be a time evolving composition, possibly due to climatic trends. Compositional techniques allow to represent compositions in isometric log-ratio-coordinates. These coordinates are modelled as...
Conference Paper
Full-text available
Coal proximate analysis is the basis of coal reserve evaluations and is a form of compositional data. Direct geostatistical modeling of compositional data results in inconsistency and non-optimality. In this study, we compare compositional and non-compositional approaches to assess problems caused by neglecting the nature of the data. We combine is...
Article
High-throughput sequencing technologies have revolutionized microbiome research by allowing the relative quantification of microbiome composition and function in different environments. In this work we focus on the identification of microbial signatures, groups of microbial taxa that are predictive of a phenotype of interest. We do this by acknowle...
Chapter
Full-text available
Compositions describe parts of a whole and carry relative information. Compositional data appear in all fields of science, and their analysis requires paying attention to the appropriate sample space. The log-ratio approach proposes the simplex, endowed with the Aitchison geometry, as an appropriate representation of the sample space. The main char...
Article
The study of the relationships between two compositions is of paramount importance in geochemical data analysis. This paper develops a compositional version of canonical correlation analysis, called CoDA-CCO, for this purpose. We consider two approaches, using the centred log-ratio transformation and the calculation of all possible pairwise log-rat...
Article
The discrete case of Bayes' formula is considered the paradigm of information acquisition. Prior and posterior probability functions, as well as likelihood functions, called evidence functions, are compositions following the Aitchison geometry of the simplex, and have thus vector character. Bayes' formula becomes a vector addition. The Aitchison no...
Presentation
Full-text available
Analysis of compositional data, and particularly of microbiome, require a review of the role of sample space in these analyses. The review can be considered based on the book Pawlowsky-Glahn, V.; Egozcue, J. J.; Tolosana-Delgado, R. (2015): Modeling and Analysis of Compositional Data, Wiley, Chichester (UK).
Conference Paper
Full-text available
A multinomial sampling scheme is considered. The proportions of counts in each category are expressed in coordinates through a log-ratio approach. The isometric log-ratio transformation ($\ilr$) has been used. The asymptotic normality of these coordinates is assessed. The normal distribution mean and covariance parameters are expressed as functions...
Article
Coal proximate analysis is a form of typical compositional data, commonly represented with constant sum. Although the direct geostatistical modeling of compositional data provides apparently reasonable outputs, the results are always exposed to inconsistency and non-optimality. In this paper, we compare the compositional and noncompositional approa...
Article
Full-text available
With compositional data ordinary covariation indexes, designed for real random variables, fail to describe dependence. There is a need for compositional alternatives to covariance and correlation. Based on the Euclidean structure of the simplex, called Aitchison geometry, compositional association is identied to a linear restriction of the sample s...
Article
Full-text available
The area east of Varanasi is one of numerous places along the watershed of the Ganges River with groundwater concentrations of arsenic surpassing the maximum value of 10 parts per billion (ppb) recommended by the World Health Organization in drinking water. Here we apply geostatistics and compositional data analysis for the mapping of arsenic and i...
Preprint
Full-text available
High-throughput sequencing technologies have revolutionized microbiome research by allowing the relative quantification of microbiome composition and function in different environments. One of the main goals in microbiome analysis is the identification of microbial species that are differentially abundant among groups of samples, or whose abundance...
Article
Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretativ...
Presentation
Full-text available
The sample space and its structure (operations and metrics if any) is the first step in any statistical modelling. In Bayesian statistics, parameters of the observational model are assumed to be random. Their sample space is the parameter space and it has to be taken as a sample space of random parameters. As such, its structure is a key point, for...
Article
Full-text available
Datasets collected by high-throughput sequencing (HTS) of 16S rRNA gene amplimers, metagenomes or metatranscriptomes are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by HTS are compositional because they hav...
Conference Paper
Full-text available
Early definitions of compositional data were based on the constant sum of the components. In the eighties, John Aitchison complemented this definition with some properties and principles. However, a formal definition of compositional data and their different typologies is still pendent. Frequently, although not free of controversial opinions , comp...
Poster
Full-text available
The perceived benefits of the male condom increase the likelihood of its use, while perceived harm reduces it, with a greater influence of the former on the latter [6]. In addition to the male and female condoms, there are different methods to avoid sexually transmitted infections/AIDS and unwanted pregnancies, such as the contraceptive pill and th...
Poster
Full-text available
The perceived benefits of the male condom increase the likelihood of its use, while perceived harm reduces it, with a greater influence of the former on the latter [6]. In addition to the male and female condoms, there are different methods to avoid sexually transmitted infections/AIDS and unwanted pregnancies, such as the contraceptive pill and th...
Preprint
Full-text available
The study of the relationships between two compositions by means of canonical correlation analysis is addressed A coimnositional version of canonical correlation analysis is developed. and called CODA-CCO. We consider two approaches, using the centred log-ratio transformation and the calculation of all possible pairwise log-ratios within sets. The...
Article
Full-text available
Environmental risk management consists of making decisions on human activities or construction designs that are affected by the environment and/or have consequences or impacts on it. In these cases, decisions are made such that risk is minimized. In this regard, the forthcoming paper develops a close form that relates risk with cost, hazard, and vu...
Conference Paper
Using an approach based on the Aitchison geometry of the simplex, a Shifted-Dirichlet covariate model is obtained. Allowing the parameters to change linearly with a set of covariates, their effects on the relative contributions of different components in a composition are assessed. An application of this model to sedimentary petrography is given.
Conference Paper
A function assigning a composition to space-time points is called a compositional or simplicial field. These fields can be analyzed using the compositional analysis tools. In order to study compositions depending on space and/or time, reformulation and interpretation of traditional partial differential operators is required. These operators such as...
Conference Paper
Full-text available
The Aitchison geometry of the simplex, the sample space of compositional data, allows statistical modelling and analysis of compositions without the problems derived from spurious correlation. Here, it is used to show that it offers an alternative to the de Finetti ternary diagram for representing variability of species composition avoiding the pro...
Conference Paper
The purpose of this contribution is to evaluate whether there is enough statistical basis to establish a relationship between the popularity of certain terms in the Google browser and the evolution of several worldwide economic indices the subsequent week. A linear model trying to predict the evolution of 19 financial indices from all over the worl...
Conference Paper
Full-text available
Modeling dependence between two or more variables is a common issue in statistical applications. The Pearson correlation coefficient is often used to measure dependence, although it only captures linear dependence. Kendall's τ or Spearman's ρ, among others [4, 5] are popular alternatives for those wanting to model dependencies other than linear. Th...
Presentation
Full-text available
From constant sum constraint to association of compositional variables. Compositional data have been defined in the sixties by the constant sum of its positive components; nowadays, this definition has been updated to take into account the scale of compositions, and the fact that compositional information consists of the ratios between compositiona...
Conference Paper
A two-way discrete classification is characterized by a table of probabilities. This table of probabilities can be assumed to be the parameters of a multinomial sampling.This two-way probability table can be interpreted as a composition in the simplex, and therefore the Aitchison geometry is a suitable structure for its analysis. In this geometry,...
Article
Full-text available
In a multinomial sampling, contingency tables can be parametrized by probabilities of each cell. These probabilities constitute the joint probability function of two or more discrete random variables. These probability tables have been previously studied from a compositional point of view. The compositional analysis of probability tables ensures co...
Article
Full-text available
pre> Standard analysis of compositional data under the assumption that the Aitchison geometry holds assumes a uniform distribution as reference measure of the space. Weighting of parts can be done changing the reference measure. The changes that appear in the algebraic-geometric structure of the simplex are analysed, as a step towards understanding...
Poster
Full-text available
Microbiome data assign abundances to OTUs or taxa and, for each experimental condition, the total number of counts recorded is not relevant. Absolute and relative frequencies carry the same information about the composition. Results do not change when frequencies are multiplied by a positive constant, and analyses of subsets of the original OTU’s a...
Article
A method for the measurement of the size diversity based on the classical Shannon–Wiener expression was proposed as a proxy of the shape of the size distribution. The summatory of probabilities of a discrete variable (such as species relative abundances) in the original Shannon–Wiener expression was substituted by an integral of the probability den...
Article
Full-text available
Isometric log ratios of proportions of major ions, derived from intuitive sequential binary partitions, are used to characterise hydrochemical variability within and between coal seam gas (CSG) and surrounding aquifers in a number of sedimentary basins in the USA and Australia. These isometric log ratios are the coordinates corresponding to an orth...
Article
Measurements to determine coal quality as fuel include proximate analysis, ultimate analysis and calorific value. The latter is an attribute taking non-negative real values, so a simple transformation is sufficient for its spatial modeling applying geostatistics. The analyses, however, involve proportions that follow the properties of compositional...
Article
Purpose: The ability to properly analyze and interpret large microbiome data sets has lagged behind our ability to acquire such data sets from environmental or clinical samples. Sequencing instruments impose a structure on these data: the natural sample space of a 16S rRNA gene sequencing data set is a simplex, which is a part of real space that i...
Article
Full-text available
Compositions describe parts of a whole which carry relative information. Compositional data appear in all fields of science and their analysis requires paying attention to the appropriate sample space. The log-ratio approach proposes the simplex, endowed with the Aitchison geometry, as an appropriate sample space. The main characteristics of the Ai...
Article
Like the statistical analysis of compositional data in general, spatial analysis of compositional data requires specific tools. A historical overview of their development is presented in three steps: (a) the recognition of the problem, known as spurious spatial covariance, (b) first attempts to use the logratio approach, and (c) the application of...