# Vanesa GuerreroUniversity Carlos III de Madrid | UC3M · Department of Statistics

Vanesa Guerrero

PhD

## About

32

Publications

8,264

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

255

Citations

Introduction

**Skills and Expertise**

## Publications

Publications (32)

We propose a data-driven methodology to learn a low-dimensional manifold of controlled flows. The starting point is resolving snapshot flow data for a representative ensemble of actuations. Key enablers for the actuation manifold are isometric mapping as encoder, and a combination of a neural network and a $k$ -nearest-neighbour interpolation as de...

Feature selection is a recurrent research topic in modern regression analysis, which strives to build interpretable models, using sparsity as a proxy, without sacrificing predictive power. The best subset selection problem is central to this statistical task: it has the goal of identifying the subset of covariates of a given size that provides the...

The vast amount of complex data generated nowadays demands for innovative and flexible techniques which allow to accommodate expert knowledge and help in decision-making. In this work, we address the problem of estimating multivariate smooth functions in a regression framework which verify conditions about their sign, monotonicity or curvature by m...

Decision-making is often based on the analysis of complex and evolving data. Thus, having systems which allow to incorporate human knowledge and provide valuable support to the decider becomes crucial. In this work, statistical modelling and mathematical optimization paradigms merge to address the problem of estimating smooth curves which verify st...

We propose a novel nonlinear manifold learning from snapshot data and demonstrate its superiority over proper orthogonal decomposition (POD) for shedding-dominated shear flows. Key enablers are isometric feature mapping, Isomap, as encoder and, $K$ -nearest neighbours ( $K$ NN) algorithm as decoder. The proposed technique is applied to numerical an...

The biological age is an indicator of the functional condition of an individual’s body. Unlike the chronological age, which just measures the time from birth, the biological age of a human is also affected by its medical condition, life habits, some sociodemographic variables, as well as biomarkers. Taking advantage of the statistical concept of de...

Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to...

We propose a novel non-linear manifold learning from snapshot data and demonstrate its superiority over Proper Orthogonal Decomposition (POD) for shedding-dominated shear flows. Key enablers are isometric feature mapping, Isomap (Tenenbaum et al., 2000), as encoder and K-nearest neighbours (KNN) algorithm as decoder. The proposed technique is appli...

In an era when the decision-making process is often based on the analysis of complex and evolving data, it is crucial to have systems which allow to incorporate human knowledge and provide valuable support to the decider. In this work, statistical modelling and mathematical optimization paradigms merge to address the problem of estimating smooth cu...

Biomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended null hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold,...

Cluster analysis is applied to a DNS dataset of a transitional boundary layer developing over a flat plate. The stream-wise-span-wise plane at a wall normal distance close to the wall is sampled at several time instants and discretized into small sub-regions, which are the observations analysed in this work. Using K-medoids clustering algorithm, a...

A data-driven approach for the identification of local turbulent-flow states and of their dynamics is proposed. After subdividing a flow domain in smaller regions, the K-medoids clustering algorithm is used to learn from the data the different flow states and to identify the dynamics of the transition process. The clustering procedure is carried ou...

In this work, a new approach to cluster large sets of time series is presented. The proposed methodology takes into account the dependency among the time series to obtain a fuzzy partition of the set of observations. A two-step procedure to accomplish this is presented. First, the cophenetic distances, based on a time series linear cross-dependency...

Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components, but at the same time it may overfit, it may be distorted by base regressors with...

Cluster analysis is applied to a DNS dataset of a transitional boundary layer developing over a flat plate. The streamwise-spanwise plane at a wall normal distance close to the wall is sampled at several time instants and discretized into small sub-regions, which are the observations analyzed in this work. Using K-medoids clustering algorithm, a pa...

Our recent research on Data-driven dynamics description of a transitional boundary layer. This research challenges Machine Learning tools with the boundary-layer-transition theory to detect the regions of the transitional boundary layer flow automatically, and to describe the flow dynamics leading to the transition to turbulence.

Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components , but at the same time it may overfit, it may be distorted by base regressors with...

In this paper, we propose a mathematical optimization approach to cluster the rows and/or columns of contingency tables to detect possible statistical dependencies among the observed variables. With this, we obtain a clustered contingency table of smaller size, which is desirable when interpreting the statistical dependence results of the observed...

COVID-19 is an infectious disease that was first identified in China in December 2019. Subsequently COVID-19 started to spread broadly, to also arrive in Spain by the end of Jan-uary 2020. This pandemic triggered confinement measures, in order to reduce the expansion of the virus so as not to saturate the health care system. With the aim of providi...

Exploratory Factor Analysis (EFA) is a widely used statistical technique to discover the structure of latent unobserved variables, called factors, from a set of observed variables. EFA exploits the property of rotation invariance of the factor model to enhance factors' interpretability by building a sparse loading matrix. In this paper, we propose...

In this paper we propose an optimization model and a solution approach to visualize datasets which are made up of individuals observed along different time periods. These individuals have attached a time-dependent magnitude and a dissimilarity measure, which may vary over time. Difference of convex optimization techniques, namely, the so-called Dif...

Exploratory Factor Analysis (EFA) is a widely used statistical technique to discover the structure of latent unobserved variables, called factors, from a set of observed variables. EFA exploits the property of rotation invariance of the factor model to enhance factors' interpretability by building a sparse loading matrix. In this paper, we propose...

We consider a nonlinear version of the Uncapacitated Facility Location Problem (UFLP). The total cost in consideration consists of a fixed cost to open facilities, a travel cost in proportion to the distance between demand and the assigned facility, and an operational cost at each open facility, which is assumed to be a concave nondecreasing functi...

In this article we develop a novel online framework to visualize news data over a time horizon. First, we perform a Natural Language Processing analysis, wherein the words are extracted, and their attributes, namely the importance and the relatedness, are calculated. Second, we present a Mathematical Optimization model for the visualization problem...

In this paper we develop a new framework to visualize datasets which are made up of individuals observed along different time periods. These individuals have attached a time-dependent magnitude and a dissimilarity measure, which may vary over time. A mathematical optimization model is proposed and solved by means of difference of convex optimizatio...

In this paper we develop an online tool to visualize news data over a time horizon. First, we perform a Natural Language Processing analysis, where the words are extracted, and their attributes, namely the importance and the relatedness, are calculated. Second, we present a Mathematical Optimization model for the visualization problem and a numeric...

In this paper we address the problem of visualizing a frequency distribution and an ad-jacency relation attached to a set of individuals. We represent this information using a rectangular map, i.e., a subdivision of a rectangle into rectangular portions so that each portion is associated with one individual, their areas reflect the frequencies, and...

In this paper we address the problem of visualizing in a bounded region a set of individuals, which has attached a dissimilarity measure and a statistical value, as convex objects. This problem, which extends the standard Multidimensional Scaling Analysis, is written as a global optimization problem whose objective is the difference of two convex f...

In this paper we address the problem of visualizing a set of individuals, which have attached a statistical value given as a proportion, and a dissimilarity measure. Each individual is represented as a region within the unit square, in such a way that the area of the regions represent the proportions and the distances between them represent the dis...

Principal component analysis is a popular data analysis dimensionality reduction technique, aiming to project with minimum error for a given dataset into a subspace of smaller number of dimensions. In order to improve interpretability, different variants of the method have been proposed in the literature, in which, besides error minimization, spars...

Principal Components are usually hard to interpret. Sparseness is considered as one way to improve interpretability, and thus a trade-off between variance explained by the components and sparseness is frequently sought. In this note we address the problem of simultaneous maximization of variance explained and sparseness, and a heuristic method is p...