# Ioannis NtzoufrasAthens University of Economics and Business | AUEB · Department of Statistics

Ioannis Ntzoufras

Phd in Statistics

## About

127

Publications

60,918

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

3,979

Citations

Introduction

Additional affiliations

December 2015 - January 2017

January 2007 - present

January 2006 - present

## Publications

Publications (127)

The reliability of the results of network meta‐analysis (NMA) lies in the plausibility of the key assumption of transitivity. This assumption implies that the effect modifiers' distribution is similar across treatment comparisons. Transitivity is statistically manifested through the consistency assumption which suggests that direct and indirect evi...

The goal of this paper is to build and compare methods for the prediction of the final outcomes of basketball games. In this study, we analyzed data from four different European tournaments: Euroleague, Eurocup, Greek Basket League and Spanish Liga ACB. The data-set consists of information collected from box scores of 5214 games for the period of 2...

In this paper, we analyze a sport (in)activity case study using a zero inflated bivariate Poisson model. We use the “(in)activity” term in order to embrace both active and passive sport participation (practicing or watching a sport, respectively). The paper investigates the determinants of sport (in)activity: the frequency and the probability of sp...

Competitive balance is of much interest in the sports analytics literature and beyond. We develop a statistical network model based on an extension of the stochastic block model to assess the balance between teams in a league. We represent the outcome of all matches in a football season as a dense network with nodes identified by teams and categori...

We consider a flexible Bayesian evidence synthesis approach to model the age-specific transmission dynamics of COVID-19 based on daily age-stratified mortality counts. The temporal evolution of transmission rates in populations containing multiple types of individual are reconstructed via an appropriate dimension-reduction formulation driven by ind...

The reliability of the results of network meta-analysis (NMA) lies in the plausibility of key assumption of transitivity. This assumption implies that the effect modifiers' distribution is similar across treatment comparisons. Transitivity is statistically manifested through the consistency assumption which suggests that direct and indirect evidenc...

This paper is concerned with a contemporary Bayesian approach to the effect of temperature on developmental rates. We develop statistical methods using recent computational tools to model four commonly used ecological non-linear mathematical curves that describe arthropods’ developmental rates. Such models address the effect of temperature fluctuat...

The Power–Expected–Posterior (PEP) prior framework provides us a convenient and objective method to deal with variable selection problems, under the Bayesian perspective, in regression models. The PEP prior inherits all of the advantages of Expected–Posterior–Prior. Furthermore, it avoids the need of selection of imaginary data and mitigates their...

A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with...

The Ising model is one of the most widely analyzed graphical models in network psychometrics. However, popular approaches to parameter estimation and structure selection for the Ising model cannot naturally express uncertainty about the estimated parameters or selected structures. To address this issue, this paper offers an objective Bayesian appro...

The focus of this work is to estimate the number of team possessions in Euroleague basketball for seasons 2017–18, 2018–19 and 2019–20. To achieve this goal, we implemented the approaches proposed by Kubatko, et al. (2007). The statistical analysis on Euroleague data suggests a model similar to the one which is currently used in NBA with one, minor...

Sports have been always a fertile area of application of mathematics with a wide range of potential problems, ranging from simple examples which can be effectively used to teach students fundamental ideas about mathematics, probability and statistics (see, e.g. Kvam & Sokol, 2004), up to complicated models that can be utilized to study space creati...

Competitive balance is a desirable feature in any professional sports league and encapsulates the notion that there is unpredictability in the outcome of games as opposed to an imbalanced league in which the outcome of some games are more predictable than others, for example, when an apparent strong team plays against a weak team. In this paper, we...

A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized Adaptive Subspace (MAdaSub) algorithm, is proposed for sampling from high-dimensional posterior model distributions in Bayesian variable selection. The MAdaSub algorithm is based on an independent Metropolis-Hastings sampler, where the individual proposa...

This paper is concerned with a contemporary Bayesian approach to the effect of temperature on developmental rates. We develop statistical methods using recent computational tools to model four commonly used ecological non-linear mathematical curves that describe arthropods' developmental rates. Such models address the effect of temperature fluctuat...

We study and develop Bayesian models for the analysis of volleyball match outcomes as recorded by the set-difference. Due to the peculiarity of the outcome variable (set-difference) which takes discrete values from $-3$ to $3$, we cannot consider standard models based on the usual Poisson or binomial assumptions used for other sports such as footba...

Bayes Factors, the Bayesian tool for hypothesis testing, are receiving increasing attention in the literature. Compared to their frequentist rivals ($p$-values or test statistics), Bayes Factors have the conceptual advantage of providing evidence both for and against a null hypothesis and they can be calibrated so that they do not depend so heavily...

A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for variable selection in high-dimensional linear regression models. The method aims at finding the best model with respect to a certain model selection criterion and is based on the idea of adaptively solving low-dimensional sub-problems in order to provide a...

The Ising model is one of the most widely analyzed graphical models in network psychometrics. Unfortunately, popular approaches to parameter estimation and structure selection for the Ising model cannot naturally express uncertainty about the estimated parameters or selected structures. To address this issue, this paper offers an objective Bayesian...

The aim of this paper is to study and develop Bayesian models for the analysis of the volleyball game
outcomes as recorded by the difference of the winning sets. Due to the peculiarity of the outcome variable
(set difference) which takes discrete values from 3 to 3, we cannot consider standard models based
on the usual Poisson or binomial assumpti...

Volleyball is a team sport with unique and specific characteristics. We introduce a new two-level hierarchical Bayesian model which accounts for these volleyball-specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a trunca...

This paper focuses on the Bayesian model average (BMA) using the power–expected– posterior prior in objective Bayesian variable selection under normal linear models. We derive a BMA point estimate of a predicted value, and present computation and evaluation strategies of the prediction accuracy. We compare the performance of our method with that of...

A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with...

One of the main approaches used to construct prior distributions for objective Bayes methods is the concept of random imaginary observations. Under this setup, the expected-posterior prior (EPP) offers several advantages, among which it has a nice and simple interpretation and provides an effective way to establish compatibility of priors among mod...

Unlike what happens for other popular sports such as football, basketball and baseball, modelling the final outcomes of volleyball has not been thoroughly addressed by the statistical and the data science community. This is mainly due to the complexity of the game itself since the game is played in two levels of outcomes: the sets and the points (w...

A stochastic search method, the so-called Adaptive Subspace (AdaSub) method, is proposed for variable selection in high-dimensional linear regression models. The method aims at finding the best model with respect to a certain model selection criterion and is based on the idea of adaptively solving low-dimensional sub-problems in order to provide a...

Bayesian methods for graphical log-linear marginal models have not been developed in the same extent as traditional frequentist approaches. In this work, we introduce a novel Bayesian approach for quantitative learning for such models. These models belong to curved exponential families that are difficult to handle from a Bayesian perspective. Furth...

In volleyball, due to the sequential structure of the game, each outcome results from events that follow consistent consecutive patterns: pass–set–attack–outcome, serve–outcome and block–dig–set–counter attack–outcome. There are three possible outcomes: point won, point lost, and rally continuation. With the aim of quantifying the importance of vol...

The hyper‐g prior is a default choice for Bayesian variable selection in normal linear regression models. In this article we provide an overview of the Bayesian variable selection framework and explain in detail the specification for the hyper‐g prior setup. The practical implementation of the methods under consideration is demonstrated through the...

In this work, the problem of transformation and simultaneous variable selection is thoroughly treated via objective Bayesian approaches by the use of default Bayes factor variants. Four uniparametric families of transformations (Box–Cox, Modulus, Yeo-Johnson and Dual), denoted by T, are evaluated and compared. The subjective prior elicitation for t...

We provide a review of prior distributions for objective Bayesian analysis. We start by examining some foundational issues and then organize our exposition into priors for: i) estimation or prediction; ii) model selection; iii) highdimensional models. With regard to i), we present some basic notions, and then move to more recent contributions on di...

The power-expected-posterior (PEP) prior provides an objective, automatic, consistent and parsimonious model selection procedure. At the same time it resolves the conceptual and computational problems due to the use of imaginary data. Namely, (i) it dispenses with the need to select and average across all possible minimal imaginary samples, and (ii...

Thermodynamics have been shown to have direct applications in Bayesian model evaluation. Within a tempered transitions scheme, the Boltzmann–Gibbs distribution pertaining to different Hamiltonians is implemented to create a path which links the distributions of interest at the endpoints. As illustrated here, an optimal temperature exists along the...

Epidemic data often possess certain characteristics, such as the presence of many zeros, the spatial nature of the disease spread mechanism, environmental noise, serial correlation and dependence on time varying factors. This paper addresses these issues via suitable Bayesian modelling. In doing so we utilise a general class of stochastic regressio...

Power-expected-posterior (PEP) priors have been recently introduced as generalized versions of the expected-posterior-priors (EPPs) for variable selection in Gaussian linear models. They are minimally-informative priors that reduce the effect of training samples under the EPP approach, by combining ideas from the power-prior and unit-information-pr...

The power-expected-posterior (PEP) prior is an objective prior for Gaussian linear models, which leads to consistent model selection inference and tends to favor parsimonious models. Recently, two new forms of the PEP prior where proposed which generalize its applicability to a wider range of models. We examine the properties of these two PEP varia...

In this work we develop novel hypothesis tests for association models for two way contingency tables. We focus on conjugate analysis for the uniform, row and column effect model which can be considered as Poisson log-linear or Multinomial logit models. For the row-column model we will develop an MCMC based approach which will try to explore conditi...

In this work we present Bayesian hypothesis tests for the independence between two categorical variables in contingency tables. Initially we implement conjugate analysis based on the Multinomial-Dirichlet setup. We compute the Bayes factor and assess the sensitivity of the results to the prior distribution. Then we focus on log-linear models. We co...

The stochastic search variable selection (SSVS), introduced by George and McCulloch[1], is one of the prominent Bayesian variable selection approaches for regression problems. Some of the basic principles of modern Bayesian variable selection methods were first introduced via the SSVS algorithm such as the use of a vector of variable inclusion indi...

The power-conditional-expected-posterior (PCEP) prior developed for variable
selection in normal regression models combines ideas from the power-prior and
expected-posterior prior, relying on the concept of random imaginary data, and
provides a consistent variable selection method which leads to parsimonious
inference. In this paper we discuss the...

Although competitive balance is an important concept for professional team
sports, its quantification still remains an issue. The main objective of this
study is to identify the best or optimal index for the study of competitive
balance in European football using a number of economic variables and data from
eight domestic leagues from 1959 to 2008....

Volleyball is a competitive team sport whose main objective is to score the most points by grounding the ball to the opponents side of the court. The numbers of points a team scores is primarily based on the execution of the skills of the game. Due to the hierarchical structure of the game events follow stable patterns: serve outcome, pass-set-atta...

In this work we consider Cloninger's psychobiological model, which measures two dimensions of personality: character and temperament. Temperament refers to the biological basis of personality and its characteristics, while character refers to an individual's attitudes towards own self, towards humanity and as part of the universe.
The Temperament a...

The problem of transformation selection is thoroughly treated from a Bayesian perspective. Several families of transformations are considered with a view to achieving normality: the Box-Cox, the Modulus, the Yeo and Johnson and the Dual transformation. Markov Chain Monte Carlo algorithms have been constructed in order to sample from the posterior d...

Epidemic data often possess certain characteristics, such as the presence of
many zeros, the spatial nature of the disease spread mechanism or environmental
noise. This paper addresses these issues via suitable Bayesian modelling. In
doing so we utilise stochastic regression models appropriate for
spatio-temporal count data with an excess number of...

Competitive balance is a key issue for any professional sport league substantiated by its effect on demand for league games or other associated products. This work focuses on the measurement of between-seasons competitive balance, the longest time-wise dimension, which captures the relative quality of teams across seasons. The review of the existin...

The problem of transformation selection is thoroughly treated from a Bayesian
perspective. Several families of transformations are considered with a view to
achieving normality: the Box-Cox, the Modulus, the Yeo & Johnson and the Dual
transformation. Markov chain Monte Carlo algorithms have been constructed in
order to sample from the posterior dis...

We investigate the efficiency of a marginal likelihood estimator where the
product of the marginal posterior distributions is used as an
importance-sampling function. The approach is generally applicable to
multi-block parameter vector settings, does not require additional Markov Chain
Monte Carlo (MCMC) sampling and is not dependent on the type of...

In latent variable models the parameter estimation can be implemented by
using the joint or the marginal likelihood, based on independence or
conditional independence assumptions. The same dilemma occurs within the
Bayesian framework with respect to the estimation of the Bayesian marginal (or
integrated) likelihood, which is the main tool for model...

Within path sampling framework, we show that probability distribution
divergences, such as the Chernoff information, can be estimated via
thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to
different Hamiltonians is implemented to derive tempered transitions along the
path, linking the distributions of interest at the endpoint...

In this paper we implement a Markov chain Monte Carlo algorithm based on the stochastic search variable selection method of George and McCulloch (1993) for identifying promising subsets of manifest variables (items) for factor analysis models. The suggested algorithm is constructed by embedding in the usual factor analysis model a normal mixture pr...

Expected-posterior priors (EPP) have been proved to be extremely useful for
testing hypothesis on the regression coefficients of normal linear models. One
of the advantages of using EPPs is that impropriety of baseline priors causes
no indeterminacy. However, in regression problems, they based on one or more
\textit{training samples}, that could in...

In the context of the expected-posterior prior (EPP) approach to Bayesian
variable selection in linear models, we combine ideas from power-prior and
unit-information-prior methodologies to simultaneously produce a
minimally-informative prior and diminish the effect of training samples. The
result is that in practice our power-expected-posterior (PE...

The Zellner's g-prior and its recent hierarchical extensions are the most
popular default prior choices in the Bayesian variable selection context. These
prior set-ups can be expressed power-priors with fixed set of imaginary data.
In this paper, we borrow ideas from the power-expected-posterior (PEP) priors
in order to introduce, under the g-prior...

Epidemic data often arise along with certain characteristics, such as the presence of many zeros, the spatial nature of disease spread mechanism or the environmental noise. This presentation addresses these issues via suitable Bayesian modelling. In doing so we utilize stochastic regression models appropriate for spatio-temporal count data with an...

In this paper, we focus on the variable selection problem in normal regression models using the expected-posterior prior methodology. We provide a straightforward MCMC scheme for the derivation of the posterior distribution, as well as Monte Carlo estimates for the computation of the marginal likelihood and posterior model probabilities. Additional...

The marginal likelihood can be notoriously difficult to compute, and particularly so in high-dimensional problems. Chib and Jeliazkov employed the local reversibility of the Metropolis–Hastings algorithm to construct an estimator in models where full conditional densities are not available analytically. The estimator is free of distributional assum...

We propose a Bayesian implementation of the lasso regression that accomplishes both shrinkage and variable selection. We focus on the appropriate specification for the shrinkage parameter λ through Bayes factors that evaluate the inclusion of each covariate in the model formulation. We associate this parameter with the values of Pearson and partial...

The common competitive balance indices have not been designed to fully account for the complex structure of European football
leagues. Domestic championships are multi-prize tournaments since, in addition to the competition for the championship, the
best teams also compete to qualify for the lucrative European tournaments, whereas the worst teams s...

Mystery shopping is a well known marketing technique used by companies and marketing analysts to
measure quality of service, and gather information about products and services. In this article, we analyse
data from mystery shopping surveys via Bayesian Networks in order to examine and evaluate the quality
of service offered by the loan departments...

We consider the specification of prior distributions for Bayesian model
comparison, focusing on regression-type models. We propose a particular joint
specification of the prior distribution across models so that sensitivity of
posterior model probabilities to the dispersion of prior distributions for the
parameters of individual models (Lindley's p...

Introduction: Bayesian modeling in the 21st centuryDefinition of statistical modelsBayes theoremModel-based Bayesian inferenceInference using conjugate prior distributionsNonconjugate analysisProblems

We propose a conjugate and conditional conjugate Bayesian analysis of models of marginal independence with a bi-directed graph representation. We work with Markov equivalent directed acyclic graphs (DAGs) obtained using the same vertex set with the addition of some latent vertices when required. The DAG equivalent model is characterised by a minima...

This paper deals with the Bayesian analysis of graphical models of marginal independence for three way contingency tables. Each marginal independence model corresponds to a particular factorization of the cell probabilities and a conjugate analysis based on Dirichlet prior can be performed. We illustrate a comprehensive Bayesian analysis of such mo...

Competitive balance is an important concept in professional team sports; its measurement is, therefore, a critical issue. One of the most widely used indices, which was introduced for the estimation of seasonal competitive balance is the Concentration Ratio, which is a relatively simple index and measures the extent to which a league is dominated b...

The reinvention of Markov chain Monte Carlo (MCMC) methods and their implementation within the Bayesian framework in the early 1990s has established the Bayesian approach as one of the standard methods within the applied quantitative sciences. Their extensive use in complex real life problems has lead to the increased demand for a friendly and easi...

Football is one of the most popular professional team sports in the world and a very profitable business, as professional leagues (especially in Europe) show considerable growth in annual turnover figures. Despite its substantial growth, there are important issues that the industry has to address in order to ensure its long-term success. One of the...

Existing methods for the prediction of the final scores in football games focus on modelling the numbers of goals scored by
the two competitors with parameter estimation of the assumed model usually based on the maximum likelihood approach. Although
this approach allows for sufficiently accurate prediction of the final score, it does not account fo...

Crime is disproportionally concentrated in few areas. Though long established, there remains uncertainty about the reasons for variation in the concentration of similar crime (repeats) or different crime (multiples). Wholly neglected have been composite crimes when more than one crime types coincide as parts of a single event. The research reported...

This paper provides a survey on studies that analyze the macroeconomic effects of intellectual property rights (IPR). The first part of this paper introduces different patent policy instruments and reviews their effects on R&D and economic growth. This part also discusses the distortionary effects and distributional consequences of IPR protection a...

In the field of quality of health care measurement, one approach to assessing patient sickness at admission involves a logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing classical variable selection methods to find an ``optimal''...

The measurement and improvement of the quality of health care are important areas of current research and development. A judgement of appropriateness of medical outcomes in hospital quality-of-care studies must depend on an assessment of patient sickness at admission to hospital. Indicators of patient sickness often must be abstracted from medical...

The primary aim of the current article was the evaluation of the factorial composition of the Aggression Questionnaire (AQ(29)) in the Greek population. The translated questionnaire was administered to the following three heterogeneous adult samples: a general population sample from Athens, a sample of young male conscripts and a sample of individu...