# Giovanni PistoneCollegio Carlo Alberto, Piazza Arbarello 8, Turin IT · de Castro Statistics

Giovanni Pistone

U. Rennes 1975

## About

122

Publications

7,149

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,799

Citations

Introduction

Additional affiliations

January 2013 - November 2015

January 2013 - March 2016

January 2013 - March 2016

**Collegio Carlo Alberto, Moncalieri IT**

Position

- affiliate professor

## Publications

Publications (122)

We study the possible closed-form representations of the K-distance arising in the Kantorovich transport problems on finite metric spaces. Weighted graphs, ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\od...

In this chapter, we study Information Geometry from a particular non-parametric or functional point of view. The basic model is a probabilities subset usually specified by regularity conditions. For example, probability measures mutually absolutely continuous or probability densities with a given degree of smoothness. We construct a manifold struct...

We discuss the statistical bundle of the manifold of two-variate stricly positive probability functions with given marginals. The fiber associated to each coupling turns out to be the vector space of interacions in the ANOVA decomposition with respect to the given weight. In this setting, we derive the form of the gradient flow equation for the Kan...

This chapter is a revised version of a tutorial lecture that I presented at the École de Physique des Houches on July 26–31 2020. Topics include: Non-parametric Information Geometry, the Statistical bundle, exponential Orlicz spaces, and Gaussian Orlicz-Sobolev spaces.

We derive bounds for the Orlicz norm of the deviation of a random variable defined on \(\mathbb {R}^n\) from its Gaussian mean value. The random variables are assumed to be smooth, and the bound itself depends on the Orlicz norm of the gradient. We shortly discuss possible applications to non-parametric Information Geometry.

This article focuses on the inference on the errors in manufactured parts controlled by using measurements devices. The characterization of the part surface topographies is core in several applications. A broad set of properties (tribological, optical, biological, mechanical, etc.) depends on the micro‐ and macrogeometry of the parts. Moreover, par...

A tutorial about Non-parametric Information Geometry, Statistical bundles, Orlicz spaces, and Gaussian Orlicz-Sobolev spaces.

We provide an Information-Geometric formulation of Classical Mechanics on the Riemannian manifold of probability distributions, which is an affine manifold endowed with a dually-flat connection. In a non-parametric formalism, we consider the full set of positive probability functions on a finite sample space, and we provide a specific expression fo...

This set of notes is intended for a short course aiming to provide an (almost) self-contained and (almost) elementary introduction to the topic of Information Geometry (IG) of the probability simplex. Such a course can be considered an introduction to the original monograph by Amari and Nagaoka [1], and to the recent monographs by Amari [2] and by...

This chapter discusses in detail the option to actually use the variogram as a parameterization. The notion of a variogram as it is used in geostatistics is also discussed. The chapter offers some preliminary thought about the possibility of a non‐parametric approach to Universal Kriging that aims to use the Bayes methodology. It provides a brief o...

We derive bounds for the Orlicz norm of the deviation of a random variable defined on $\mathbb{R}^n$ from its Gaussian mean value. The random variables are assumed to be smooth and the bound itself depends on the Orlicz norm of the gradient. Applications to non-parametric Information Geometry are discussed.

In Optimal Transport (OT) on a finite metric space, one defines a distance on the probability simplex that extends the distance on the ground space. The distance is the value of a Linear Programming (LP) problem on a set of real-valued 2-way tables with assigned margins. We apply to this case the methodology of moves which is usually applied in Alg...

This set of notes is intended for a short course aiming to provide an (almost) self-contained and (almost) elementary introduction to the topic of Information Geometry (IG) of the probability simplex. Such a course can be considered an introduction to the original monograph by Amari and Nagaoka (2000), and to the recent monographs by Amari (2016} a...

The computation of the Kantorovich distance (1-Wasserstein distance) on a finite state space may be a computationally hard problem in the case of a general distance. In this paper, we derive a simple closed form in the case of the geodesic distance on a weighted tree. Moreover, when the ground distance is defined by a graph, we show that the Kantor...

Given a multivariate complex centered Gaussian vector Z = ( Z 1 , ⋯ , Z p ) with non-singular covariance matrix Σ , we derive sufficient conditions on the nullity of the complex moments and we give a closed-form expression for the non-null complex moments. We present conditions for the factorisation of the complex moments. Computational consequence...

We study the class on non-parametric deformed statistical models where the deformed exponential has linear growth at infinity and is sub-exponential at zero. This class generalizes the class introduced by N.J. Newton. We discuss the convexity and regularity of the normalization operator, the form of the deformed statistical divergences and their co...

Industrial parts are routinely affected by dimensional and geometric errors originated in the course of manufacturing processes. These errors, whose pattern is typically related to a specific machining or forming process, are controlled in terms of dimensional and geometrical tolerances (such as e.g. straightness, roundness, flatness, profile) that...

The Wasserstein distance on multivariate non-degenerate Gaussian densities is a Riemannian distance. After reviewing the properties of the distance and the metric geodesic, we present an explicit form of the Riemannian metrics on positive-definite matrices and compute its tensor form with respect to the trace inner product. The tensor is a matrix w...

When sampling independent observations drawn from the uniform distribution on the unit interval, as the sample size gets large the asymptotic behaviour of both the empirical distribution function and empirical quantile function is well known. In this article we study analogous asymptotic results for the function that is obtained by composing the em...

We discuss the Pistone-Sempi exponential manifold on the finite-dimensional Gaussian space. We consider the role of the entropy, the continuity of translations, Poincar\'e-type inequalities, the generalized differentiability of probability densities of the Gaussian space.

The statistical bundle is the set of couples (Q,W) of a probability density Q and a random variableW such that E Q [W] = 0. On a finite state space, we assume Q to be a probability density with respect to the uniform probability and give an affine atlas of charts such that the resulting manifold is a model for Information Geometry. Velocity and acc...

The Wasserstein distance on multivariate non-degenerate Gaussian densities is a Riemannian distance. After reviewing the properties of the distance and the metric geodesic, we derive an explicit form of the Riemannian metrics on positive-definite matrices and compute its tensor form with respect to the trace scalar product. The tensor is a matrix,...

The statistical bundle is the set of couples ( Q , W ) of a probability density Q and a random variable W such that EQ [W] = 0. On a finite state space, we assume Q to be a probability density with respect to the uniform probability and give an affine atlas of charts such that the resulting manifold is a model for Information Geometry. Velocity and...

We discuss the Pistone-Sempi exponential manifold on the finite-dimensional Gaussian space. We consider the role of the entropy, the continuity of translations, Poincaré-type inequalities, the generalized differentiability of probability densities of the Gaussian space.

The statistical bundle is the set of couples ( Q , W ) of a probability density Q and a random variable W such that EQ [W] = 0. On a finite state space, we assume Q to be a probability density with respect to the uniform probability and give an affine atlas of charts such that the resulting manifold is a model for Information Geometry. Velocity and...

In the present paper we consider modal propositional logic and look for the constraints that are imposed to the propositions of the special type $\Box a$ by the structure of the relevant finite Kripke frame. We translate the usual language of modal propositional logic in terms of notions of commutative algebra, namely polynomial rings, ideals, and...

We propose a dimensionality reduction method for infinite—dimensional measure—valued evolution equations such as the Fokker–Planck partial differential equation or the Kushner–Stratonovich resp. Duncan–Mortensen–Zakai stochastic partial differential equations of nonlinear filtering, with potential applications to signal processing, quantitative fin...

Vigelis and Cavalcante extended the Naudts’ deformed exponential families to a generic reference density. Here, the special case of Newton’s deformed logarithm is used to construct an Hilbert statistical bundle for an infinite dimensional class of probability densities.

We study the continuity of space translations on non-parametric exponential families based on the exponential Orlicz space with Gaussian reference density.

Vigelis and Cavalcante extended the Naudts' deformed exponential families to a generic reference density. Here, the special case of Newton's deformed logarithm is used to construct an Hilbert statistical bundle for an infinite dimensional class of probability densities.

We study the continuity of space translations on non-parametric exponential families based on the exponential Orlicz space with Gaussian reference density.

Let $Z^t=(Z_1, \dots, Z_p) $ be a $p$-variate Gaussian complex random variable. Let $\alpha=(n_1,m_1,\dots,n_p,m_p)$ be a vector in $\mathbb N^{2p}$ and let $\nu(\alpha)$ be the correspondent moment: \nu(\alpha) =\frac{1}{\pi^p \det (\Sigma_z)} \int_{\mathbb C^p} z_1^{n_1}\ \overline z_1^{m_1}\ z_2^{n_2}\ \overline z_2^{m_2} \cdots z_p^{n_p} \ \ove...

Non-stochastic simulation models, such as finite element or computational fluid dynamics, often support real experiments in industrial research. It has become a common practice to provide a meta-model as computer experiments can be highly complex and time-consuming, and the design space is often broad. The meta-model is an approximation of the comp...

We apply the $L^2$ based Fisher-Rao vector-field projection by Brigo, Hanzon and LeGland (1999) to finite dimensional approximations of the Fokker Planck equation on exponential families. We show that if the sufficient statistics are chosen among the diffusion eigenfunctions the finite dimensional projection or the equivalent assumed density approx...

We propose a dimensionality reduction method for infinite--dimensional
measure--valued evolution equations such as the Fokker-Planck partial
differential equation or the Kushner-Stratonovich resp. Duncan-Mortensen-Zakai
stochastic partial differential equations of nonlinear filtering, with
potential applications to signal processing, quantitative f...

(Semi)Variograms are usually discussed in the framework of stationary or intrinsically stationary processes. We retell here this piece of theory in the setting of generic Gaussian vectors and of Gaussian vectors with constant variance. We show how to reparametrize the distribution as a function of the variogram and how to characterise all the Gauss...

We discuss the use of variograms for covariance modeling under the Kriging model to assess tolerances on manufactured parts. The variogram is very informative about the spatial dependence and it is favored by researchers in the choice of a correlation function. It may give evidence of anisotropy and of nugget effect too. In this paper, various vari...

In this paper, we study Amari's natural gradient flows of real functions defined on the densities belonging to an exponential family on a finite sample space. Our main example is the minimization of the expected value of a real function defined on the sample space. In such a case, the natural gradient flow converges to densities with reduced suppor...

Variograms are usually discussed in the framework of stationary or
intrinsically stationary processes. We retell here this piece of theory in the
setting of generic Gaussian vectors

Information Geometry generalizes to infinite dimension by modeling the
tangent space of the relevant manifold of probability densities with
exponential Orlicz spaces. We analyse the Boltzmann operator in the geometric
setting from the point of view of its Maxwell's weak form as a composition of
elementary operations in the exponential manifold, nam...

The polarization measure is the probability that among 3 individuals chosen
at random from a finite population exactly 2 come from the same class. This
index is maximum at the midpoints of the edges of the probability simplex. We
compute the gradient flow of this index that is the differential equation whose
solutions are the curves of steepest asc...

We study the optimization of a continuous function by its stochastic relaxation, i.e., the optimization of the expected value of the function itself with respect to a density in a statistical model. We focus on gradient descent techniques applied to models from the exponential family and in particular on the multivariate Gaussian distribution. From...

We study the natural gradient flow of the expected value E
p
[f] of an objective function f for p in an exponential family. We parameterize the exponential family with the expectation parameters and we show that the dynamical system associated to the natural gradient flow can be extended outside the marginal polytope.

We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of...

We discuss the use of the Newton method in the computation of max(p bar right arrow E-p vertical bar f vertical bar), where p belongs to a statistical exponential family on a finite state space. In a number of papers, the authors have applied first order search methods based on information geometry. Second order methods have been widely used in opt...

Information Geometry has been used to inspire efficient algorithms for
stochastic optimization, both in the combinatorial and the continuous case. We
give an overview of the authors' research program and some specific
contributions to the underlying theory.

We review a nonparametric version of Amari's Information Geometry in which
the set of positive probability densities on a given sample space is endowed
with an atlas of charts to form a differentiable manifold modeled on Orlicz
Banach spaces. This nonparametric setting is used to discuss the setting of
typical problems in Machine Learning and Stati...

In the production process of silicon wafers, which are crystalline slices used as substrate of electronic micro‐circuits, the thickness of the SiO2 deposition on their top is a main characteristic to be controlled during the process. The experimental design that is commonly used to monitor the thickness to the target value consists of a regular arr...

The differential-geometric structure of the set of positive densities on a
given measure space has raised the interest of many mathematicians after the
discovery by C.R. Rao of the geometric meaning of the Fisher information. Most
of the research is focused on parametric statistical models. In series of
papers by author and coworkers a particular v...

The geometric framework based on Stochastic Relaxation allows to describe from a common perspective different model-based optimization algorithms that make use of statistical models to guide the search for the optimum. In this paper Stochastic Relaxation is used to provide theoretical results on Estimation of Distribution Algorithms (EDAs). By the...

The joint use of counting functions, Hilbert basis, and Markov basis allows to define a procedure to generate all the fractional factorial designs that satisfy a given set of constraints in terms of orthogonality [Fontana, Pistone and Rogantin (JSPI, 2000), Pistone and Rogantin (JSPI, 2008)]. The general case of mixed level designs without restrict...

We consider the classical problem of computing the expected value of a real
function of the d-variate random variable X using cubature formulae. We use in
synergy tools from Commutative Algebra for cubature rulae, from elementary
orthogonal polynomial theory and from Probability.

A deformed logarithm function called $q$-logarithm has received considerable
attention by physicist after its introduction by C. Tsallis. J. Naudts has
proposed a generalization called $\phi$-logarithm and he has derived the basic
properties of $\phi$-exponential families. In this paper we study the related
notion of marginal polytope in the case o...

In Computer Experiments (CE), a careful selection of the design points is essential for predicting the system response at
untried points, based on the values observed at tried points. In physical experiments, the protocol is based on Design of
Experiments, a methodology whose basic principles are questioned in CE. When the responses of a CE are mod...

This is a review of current research in Markov chains as toric statistical
models. Its content is mixture of background information, results from the
relevant recent literature, new results, and work in progress.

Stochastic relaxation aims at finding the minimum of a fitness function by identifying a proper sequence of distributions, in a given model, that minimize the expected value of the fitness function. Different algorithms fit this framework, and they differ according to the policy they implement to identify the next distribution in the model. In this...

In this paper we present a geometrical framework for the analysis of Estimation of Distribution Algorithms (EDAs) based on the exponential family. From a theoretical point of view, an EDA can be modeled as a sequence of densities in a statistical model that converges towards distributions with reduced support. Under this framework, at each iteratio...

Limits of densities belonging to an exponential family appear in many applications, {e.g.} Gibbs models in Statistical Physics, relaxed combinatorial optimization, coding theory, critical likelihood computations, Bayes priors with singular support, random generation of factorial designs. We discuss the problem from the methodological point of view...

Tourism is a complex and highly competitive sector. In this scenario, incoming tourism flows represent one of the key indicators for public institutions, willing to adopt an informed decision-making process for resource allocation. The accurate and timely knowledge of both the inter-regional and the foreign component at a sufficiently detailed geog...

We present a brief review of classical experimental design in the spirit of algebraic statistics. Notion of identifiability,
aliasing and estimability of linear parametric functions, confounding are expressed in relation to a set of polynomials identified
by the design, called the design ideal. An effort has been made to indicate the classical line...

For a Markov chain both the detailed balance condition and the cycle
Kolmogorov condition are algebraic binomials. This remark suggests to study
reversible Markov chains with the tool of Algebraic Statistics, such as toric
statistical models. One of the results of this study in an algebraic
parameterization of reversible Markov transitions and thei...

Computational commutative algebra has been applied to the design of experiments by defining a design as a 0-dimensional variety in an affine space. Responses over the design are modeled by polynomials, while fractional designs are represented by indicator polynomials. A special choice of the coding of factor levels leads, via the application of dis...

We discuss the use of Kaniadakis’ κ-exponential in the construction of a statistical manifold modelled on Lebesgue spaces
of real random variables. Some algebraic features of the deformed exponential models are considered. A chart is defined for
each strictly positive densities; every other strictly positive density in a suitable neighborhood of th...

The joint use of counting functions, Hilbert basis and Markov basis allows to define a procedure to generate all the fractional factorial designs that satisfy a given set of constraints in terms of orthogonality (Fontana, Pistone and Rogantin (JSPI,2000), Pistone and Rogantin (JSPI, 2008)). The general case of mixed level designs, without restricti...

We discuss the use of Kaniadakis' $\kappa$-exponential in the construction of a statistical manifold modelled on Lebesgue spaces of real random variables. Some algebraic features of the deformed exponential models are considered. A chart is defined for each strictly positive densities; every other strictly positive density in a suitable neighborhoo...

In the present paper we discuss problems concerning evolutions of densities related to Ito diffusions in the framework of the statistical exponential manifold. We develop a rigorous approach to the problem, and we particularize it to the orthogonal projection of the evolution of the density of a diffusion process onto a finite dimensional exponenti...

The selection of design points is mandatory when the goal is to study how the observed response varies upon changing the set
of input variables. In physical experimentation, the researcher is asked to investigate a number of issues to gain valuable
inferences. Design of experiments (D.o.E.) is a helpful tool for achieving this goal. Unfortunately,...

We analyze the problem of pseudo-Boolean function optimization by introduc-ing the notion of stochastic relaxation, i.e., we find minima of f by minimizing its expected value over a set of distributions. By doing this, the parameters of the statistical model become the new variables of the optimization problem. We introduce possible parametrization...

The mathematical theory of statistical models has a very rich structure that relies on chapters from probability, functional analysis, convex analysis, differential geometry and group algebra. Recently, methods of stochastic analysis and polynomial commutative algebra emerged. Each of these theories contributes a clarification of a relevant statist...

In a general fractional factorial design, the n levels of a factor are coded by the nth roots of the unity. This device allows a full generalization to mixed-level designs of the theory of the polynomial indicator function which has already been introduced for two-level designs in a joint paper with Fontana. The properties of orthogonal arrays and...

We discuss the applications of algebraic statistics to fractional factorial design with special emphasis on the choice of level coding. In particular, we deal with the theory of Bayley's level codings in that framework.

Every fraction is a union of points, which are trivial regular fractions. To characterize non trivial decomposition, we derive a condition for the inclusion of a regular fraction as follows. Let $F = \sum_\alpha b_\alpha X^\alpha$ be the indicator polynomial of a generic fraction, see Fontana et al, JSPI 2000, 149-172. Regular fractions are charact...

In this paper we consider a Bayesian analysis of contingency tables allowing for the possibility that cells may have probability zero. In this sense we depart from standard log-linear modeling that implicitly assumes a positivity constraint. Our approach leads us to consider mixture models for contingency tables, where the components of the mixture...

We consider the non-parametric statistical model ε(p) of all positive densities q that are connected to a given positive density p by an open exponential arc, i.e. a one-parameter exponential model p(t), t ∈ I, where I is an open interval. On this model there exists a manifold structure modeled on Orlicz spaces, originally introduced in 1995
by Pis...

In this paper, we relate the problem of generating all 2-level orthogonal arrays of given dimension and force, i.e. elements in OA$(n,m)$, where $n$ is the number of factors and $m$ the force, to the solution of an Integer Programming problem involving rational convex cones. We do not restrict the number of points in the array, i.e. we admit any nu...

For a discrete distribution in Rd on a finite support D probabilities and moments are algebraically related. If there are n=|D| support points then there are n probabilities p(x),x∈D and n basic moments. By suitable interpolation of the probabilities using a Gröbner basis method, high order moments can be express linearly in terms of n basic moment...

Contribution to "School (and Workshop) on Computational Algebra for Algebraic Geometry and Statistics", Torino, September 2004. Summary. -A generalised (multivariate) divided difference formula is given for an arbitrary finite set of points with no subsets of three points that lie on a line. This follows from an extension of the Newton's polynomial...

We use computational commutative algebra to discuss and compute confounding relations for general, e.g. non-regular, fractions of a factorial design.Our method is based on the algebraic description of the design as the set of solutions of a system of polynomial equations. Gröbner bases of polynomial ideals are used as computational tools. Symbolic...

The non-parametric case of statistical manifolds poses peculiar technical problems as the basic paradigms, namely the use of the Fisher information as a metric tensor by Rao in 1945, see [12], and the geometry of exponential models introduced by Efron in 1975, see [3], are difficult to realize rigorously in one single geometric structure as advocat...

Grbner bases, elimination theory and factorization may be used to perform calculations in elementary discrete probability and more complex areas such as Bayesian networks (influence diagrams). The paper covers the application of computational algebraic geometry to probability theory. The application to the Boolean algebra of events is straightforwa...

The now well-established Gröbner basis method in experimental design (see the authors’ monograph “Algebraic Statistics”) had
the understanding of aliasing as a key motivation. The basic method asks: given an experimental design, what is estimable,
or more generally what is the alias structure? The paper addresses the following related question: giv...

Written by pioneers in this exciting new field, Algebraic Statistics introduces the application of polynomial algebra to experimental design, discrete probability, and statistics. It begins with an introduction to Gröbner bases and a thorough description of their applications to experimental design. A special chapter covers the binary case with new...

The non-parametric version of Information Geometry has been developed in recent years. The first basic result was the construction of the manifold structure on M(mu) the maximal statistical models associated to an arbitrary measure mu (see Ref. 48). Using this construction we first show in this paper that the pretangent and the tangent bundles on M...

This work extends the research program of the authors into the design and analysis of complex experiments. It shows how the special algebraic structures studied in the polynomial ring algebra and Gröbner basis environment can be exploited for situations in which there is blocking, nesting, crossing and so on, or where groups of factors are "favoure...

The problem of finding a fraction of a two-level factorial design with specific properties is usually solved within special classes, such as regular or Plackett–Burman designs. We show that each fraction of a two-level factorial design is characterized by the ANOVA representation of its polynomial indicator function. In particular, such a represent...

We show that exponential probability models with lattice support are algebraic varieties within which computational and conceptual tools from commutative algebra can be applied. Log-linear models for contingency tables and conditional independence probability models on trees are discussed within this algebraic framework.

Computations with cumulants are becoming easier through the use of computer algebra but there remains a difficulty with the finiteness of the com-putations because all distributions except the normal have an infinite number of non-zero cumulants. One is led therefore to replacing finiteness of computations by "finitely generated" in the sense of re...

Let (X) be a measure space, and let M(X, &, ft) denote the set of the μ-almost surely strictly positive probability densities. It was shown by Pistone and Sempi in 1995 that the global geometry on M(X, a, ft) can be realized by an affine atlas whose charts are defined locally by the mappings pound;(X, JZ, fi) μ B q H- log(<7/p) + K(p, q) £Bp, where...

Computational algebraic geometry can be used to solve estimability/identifiability problems in the design of experiments. The key is to replace the design as a set of points by the polynomials whose solutions are the design points. The theory and application of Gröbner bases allows one to find a unique saturated model for each so-called monomial or...

Many problems of confounding and identifiability for polynomial and multidimensional polynomial models can be solved using methods of algebraic geometry aided by the fact that modern computational algebra packages such as MAPLE can be used. The problem posed here is to give a description of the identifiable models given a particular experimental de...

The problem of computationally decide which terms are estimable in a given arbitrary design is considered. It is shown that well established procedures of computer commutative algebra solve this and other interesting problems in the theory of designed experiments. This is intended as a contribution to the construction of Computer Aided Design of Ex...

Let $\mathscr{M}_\mu$ be the set of all probability densities equivalent to a given reference probability measure $\mu$. This set is thought of as the maximal regular (i.e., with strictly positive densities) $\mu$-dominated statistical model. For each $f \in \mathscr{M}_\mu$ we define (1) a Banach space $L_f$ with unit ball $\mathscr{V}_f$ and (2)...

The equivalence of weak convergence of non-degenerate diffusions on $Rsp n$ to the G-convergence [see it E. De Giorgi and it S. Spagnolo, Boll. Unione Mat. Ital., IV. Ser. 8, 391-411 (1973; Zbl 0274.35002)] of corresponding elliptic operators is investigated.