Vanesa Guerrero

Vanesa Guerrero
University Carlos III de Madrid | UC3M · Department of Statistics

PhD

About

27
Publications
6,147
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
124
Citations
Citations since 2016
23 Research Items
121 Citations
20162017201820192020202120220102030
20162017201820192020202120220102030
20162017201820192020202120220102030
20162017201820192020202120220102030

Publications

Publications (27)
Article
Decision-making is often based on the analysis of complex and evolving data. Thus, having systems which allow to incorporate human knowledge and provide valuable support to the decider becomes crucial. In this work, statistical modelling and mathematical optimization paradigms merge to address the problem of estimating smooth curves which verify st...
Chapter
The biological age is an indicator of the functional condition of an individual’s body. Unlike the chronological age, which just measures the time from birth, the biological age of a human is also affected by its medical condition, life habits, some sociodemographic variables, as well as biomarkers. Taking advantage of the statistical concept of de...
Article
Full-text available
Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to...
Preprint
Full-text available
We propose a novel non-linear manifold learning from snapshot data and demonstrate its superiority over Proper Orthogonal Decomposition (POD) for shedding-dominated shear flows. Key enablers are isometric feature mapping, Isomap (Tenenbaum et al., 2000), as encoder and K-nearest neighbours (KNN) algorithm as decoder. The proposed technique is appli...
Preprint
Full-text available
In an era when the decision-making process is often based on the analysis of complex and evolving data, it is crucial to have systems which allow to incorporate human knowledge and provide valuable support to the decider. In this work, statistical modelling and mathematical optimization paradigms merge to address the problem of estimating smooth cu...
Article
Full-text available
Biomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended null hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold,...
Chapter
Cluster analysis is applied to a DNS dataset of a transitional boundary layer developing over a flat plate. The stream-wise-span-wise plane at a wall normal distance close to the wall is sampled at several time instants and discretized into small sub-regions, which are the observations analysed in this work. Using K-medoids clustering algorithm, a...
Article
A data-driven approach for the identification of local turbulent-flow states and of their dynamics is proposed. After subdividing a flow domain in smaller regions, the K-medoids clustering algorithm is used to learn from the data the different flow states and to identify the dynamics of the transition process. The clustering procedure is carried ou...
Article
In this work, a new approach to cluster large sets of time series is presented. The proposed methodology takes into account the dependency among the time series to obtain a fuzzy partition of the set of observations. A two-step procedure to accomplish this is presented. First, the cophenetic distances, based on a time series linear cross-dependency...
Article
Full-text available
Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components, but at the same time it may overfit, it may be distorted by base regressors with...
Conference Paper
Cluster analysis is applied to a DNS dataset of a transitional boundary layer developing over a flat plate. The streamwise-spanwise plane at a wall normal distance close to the wall is sampled at several time instants and discretized into small sub-regions, which are the observations analyzed in this work. Using K-medoids clustering algorithm, a pa...
Presentation
Our recent research on Data-driven dynamics description of a transitional boundary layer. This research challenges Machine Learning tools with the boundary-layer-transition theory to detect the regions of the transitional boundary layer flow automatically, and to describe the flow dynamics leading to the transition to turbulence.
Preprint
Full-text available
Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components , but at the same time it may overfit, it may be distorted by base regressors with...
Preprint
Full-text available
In this paper, we propose a mathematical optimization approach to cluster the rows and/or columns of contingency tables to detect possible statistical dependencies among the observed variables. With this, we obtain a clustered contingency table of smaller size, which is desirable when interpreting the statistical dependence results of the observed...
Preprint
Full-text available
COVID-19 is an infectious disease that was first identified in China in December 2019. Subsequently COVID-19 started to spread broadly, to also arrive in Spain by the end of Jan-uary 2020. This pandemic triggered confinement measures, in order to reduce the expansion of the virus so as not to saturate the health care system. With the aim of providi...
Article
Full-text available
Exploratory Factor Analysis (EFA) is a widely used statistical technique to discover the structure of latent unobserved variables, called factors, from a set of observed variables. EFA exploits the property of rotation invariance of the factor model to enhance factors' interpretability by building a sparse loading matrix. In this paper, we propose...
Article
Full-text available
In this paper we propose an optimization model and a solution approach to visualize datasets which are made up of individuals observed along different time periods. These individuals have attached a time-dependent magnitude and a dissimilarity measure, which may vary over time. Difference of convex optimization techniques, namely, the so-called Dif...
Preprint
Full-text available
Exploratory Factor Analysis (EFA) is a widely used statistical technique to discover the structure of latent unobserved variables, called factors, from a set of observed variables. EFA exploits the property of rotation invariance of the factor model to enhance factors' interpretability by building a sparse loading matrix. In this paper, we propose...
Article
We consider a nonlinear version of the Uncapacitated Facility Location Problem (UFLP). The total cost in consideration consists of a fixed cost to open facilities, a travel cost in proportion to the distance between demand and the assigned facility, and an operational cost at each open facility, which is assumed to be a concave nondecreasing functi...
Article
Full-text available
In this article we develop a novel online framework to visualize news data over a time horizon. First, we perform a Natural Language Processing analysis, wherein the words are extracted, and their attributes, namely the importance and the relatedness, are calculated. Second, we present a Mathematical Optimization model for the visualization problem...
Preprint
Full-text available
In this paper we develop a new framework to visualize datasets which are made up of individuals observed along different time periods. These individuals have attached a time-dependent magnitude and a dissimilarity measure, which may vary over time. A mathematical optimization model is proposed and solved by means of difference of convex optimizatio...
Preprint
Full-text available
In this paper we develop an online tool to visualize news data over a time horizon. First, we perform a Natural Language Processing analysis, where the words are extracted, and their attributes, namely the importance and the relatedness, are calculated. Second, we present a Mathematical Optimization model for the visualization problem and a numeric...
Article
Full-text available
In this paper we address the problem of visualizing a frequency distribution and an ad-jacency relation attached to a set of individuals. We represent this information using a rectangular map, i.e., a subdivision of a rectangle into rectangular portions so that each portion is associated with one individual, their areas reflect the frequencies, and...
Article
Full-text available
In this paper we address the problem of visualizing in a bounded region a set of individuals, which has attached a dissimilarity measure and a statistical value, as convex objects. This problem, which extends the standard Multidimensional Scaling Analysis, is written as a global optimization problem whose objective is the difference of two convex f...
Article
In this paper we address the problem of visualizing a set of individuals, which have attached a statistical value given as a proportion, and a dissimilarity measure. Each individual is represented as a region within the unit square, in such a way that the area of the regions represent the proportions and the distances between them represent the dis...
Article
Principal component analysis is a popular data analysis dimensionality reduction technique, aiming to project with minimum error for a given dataset into a subspace of smaller number of dimensions. In order to improve interpretability, different variants of the method have been proposed in the literature, in which, besides error minimization, spars...
Article
Principal Components are usually hard to interpret. Sparseness is considered as one way to improve interpretability, and thus a trade-off between variance explained by the components and sparseness is frequently sought. In this note we address the problem of simultaneous maximization of variance explained and sparseness, and a heuristic method is p...

Network

Cited By

Projects

Project (1)
Project
New mathematical programming problems will be provided to address Supervised Classification problems in which misclassification costs are class-dependent. Exact and heuristic approaches will be designed and tested