
Pepa Ramirez-CoboUniversidad de Cádiz | UCA · Department of Statistics and Operational Research
Pepa Ramirez-Cobo
PhD statistis and probability
About
57
Publications
7,778
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
511
Citations
Citations since 2017
Introduction
Additional affiliations
March 2010 - February 2012
IMUS (Instituto de Matemáticas de la Universidad de Sevilla)
Position
- PostDoc Position
February 2009 - March 2010
June 2007 - June 2008
Publications
Publications (57)
Fairness in machine learning algorithms to correct discrimination in predictions is a desirable property. We propose a Bayesian method for fairness and parameter estimation in the general regression model. Under a conjugate Normal-Gamma prior structure and for a particular choice of unfairness measure, which does not involve privacy concerns, our m...
A number of approaches have dealt with statistical assessment of self-similarity, and many of those are based on multiscale concepts. Most rely on certain distributional assumptions which are usually violated by real data traces, often characterized by large temporal or spatial mean level shifts, missing values or extreme observations. A novel, rob...
The Naïve Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Naïve Bayes’ assumption of conditional independence, and may deteriorate the method’s performance. Moreover, datasets are often characterized by a large number of features,...
The Naïve Bayes is a tractable and efficient approach for statistical classification. In general classification problems, the consequences of misclassifications may be rather different in different classes, making it crucial to control misclassification rates in the most critical and, in many realworld problems, minority cases, possibly at the expe...
The Lasso has become a benchmark data analysis procedure, and numerous variants have been proposed in the literature. Although the Lasso formulations are stated so that overall prediction error is optimized, no full control over the accuracy prediction on certain individuals of interest is allowed. In this work we propose a novel version of the Las...
Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components, but at the same time it may overfit, it may be distorted by base regressors with...
In this article we consider an aggregate loss model with dependent losses. The loss occurrence process is governed by a two-state Markovian arrival process (MAP 2), a Markov renewal process that allows for (1) correlated inter-loss times, (2) non-exponentially distributed inter-loss times and, (3) overdisperse loss counts. Some quantities of intere...
Motivated by a real failure dataset in a two-dimensional context, this paper presents an extension of the Markov modulated Poisson process (MMPP) to two dimensions. The one-dimensional MMPP has been proposed for the modeling of dependent and non-exponential inter-failure times (in contexts as queuing, risk or reliability, among others). The novel t...
This paper investigates how the production policy, as well as other factors, affect the facility location-allocation decisions. We focus on a p-median location problem in which one single perishable product is to be produced and shipped to a set of users. The time-correlated demands of the clients are generated by autoregressive processes, and they...
Since the seminal paper by Bates and Granger in 1969, a vast number of ensemble methods that combine different base regressors to generate a unique one have been proposed in the literature. The so-obtained regressor method may have better accuracy than its components , but at the same time it may overfit, it may be distorted by base regressors with...
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for 2-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but...
COVID-19 is an infectious disease that was first identified in China in December 2019. Subsequently COVID-19 started to spread broadly, to also arrive in Spain by the end of Jan-uary 2020. This pandemic triggered confinement measures, in order to reduce the expansion of the virus so as not to saturate the health care system. With the aim of providi...
One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretabi...
The Batch Markov Modulated Poisson Process (BMMPP)is a subclass of the versatile Batch Markovian Arrival Process (BMAP)which has been proposed for the modeling of dependent events occurring in batches (such as group arrivals, failures or risk events). This paper focuses on exploring the possibilities of the BMMPP for the modeling of real phenomena...
Support vector machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may...
The batch Markov‐modulated Poisson process (BMMPP) is a subclass of the versatile batch Markovian arrival process (BMAP), which has been widely used for the modeling of dependent and correlated simultaneous events (as arrivals, failures, or risk events). Both theoretical and applied aspects are examined in this paper. On one hand, the identifiabili...
Feature Selection is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable, cheaper in terms of measurement and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to t...
In this article we describe a method for carrying out Bayesian estimation for the two-state stationary Markov arrival process (MAP2), which has been proposed as a versatile model in a number of contexts. The approach is illustrated on both simulated and real data sets, where the performance of the MAP2 is compared against that of the well-known MMP...
Vector autoregressive (VAR) models constitute a powerful and well studied tool to analyze multivariate time series. Since sparseness, crucial to identify and visualize joint dependencies and relevant causalities, is not expected to happen in the standard VAR model, several sparse variants have been introduced in the literature. However, in some cas...
This paper studies in detail different problems concerning the identifiability of the non-stationary version of the MAP2. First, a matrix-based methodology to build equivalent processes is given. Second, a unique, canonical representation of the process, so that the infinite, equivalent versions of a process can be reduced to its canonical counterp...
This paper explores the single-item newsvendor problem under a novel setting which combines temporal dependence and tractable robust optimization. First, the demand is modeled as a time series which follows an autoregressive process AR(p), p>0. Second, a robust approach to maximize the worst-case revenue is proposed: a robust distribution-free auto...
In this paper we examine in detail some of the modeling capabilities of the stationary m-state , with simultaneous events up to size k, noted . Specifically, we study the forms of the auto-correlation functions of the inter-event times and event sizes. We provide a novel characterization of the functions which is suitable for analyzing the dependen...
This paper considers the non-stationary version of the Markovian arrival process to model the failures of N electrical components that are considered to be identically distributed, but for which it is not reasonable to assume that the operational times related to each component are independent or identically distributed. We propose a moment matchin...
The double Pareto Lognormal (dPlN ) statistical distribution, defined in terms of both an exponentiated skewed Laplace distribution and a lognormal distribution, has proven suitable for fitting heavy tailed data. In this work we investigate inference for the mixture of a dPlN component and (k−1)(k−1) lognormal components for k fixed, a model for ex...
The capability of modeling non-exponentially distributed and dependent inter-arrival times as well as correlated batches makes the Batch Markovian Arrival Processes (BMAP) suitable in different real-life settings as teletraffic, queueing theory or actuarial contexts. An issue to be taken into account for estimation purposes is the identifiability o...
The Markovian arrival process (MAP) is a stochastic process that allows for modeling dependent and non-exponentially distributed observations. Due to its versatility, it has been widely applied in different contexts, from reliability to teletraffic. In this work we show the suitability of the MAP for modeling daily precipitation data, which are oft...
The Markovian arrival process (MAP) has proven a versatile model for fitting
dependent and non-exponential interarrival times, with a number of applications
to queueing, teletraffic, reliability or finance. Despite theoretical
properties of MAPs and models involving MAPs are well studied, their estimation
remains less explored. This paper examines...
Most time series forecasting methods assume the series has no missing values. When missing values exist, interpolation methods, while filling in the blanks, may substantially modify the statistical pattern of the data, since critical features such as moments and autocorrelations are not necessarily preserved.In this paper we propose to interpolate...
In Ramírez-Cobo et al. (J Appl Probab 47(3):630–649, 2010b), weakly equivalent second order Markovian arrival processes (noted MAP
2s) are introduced and partially characterized. In this work we look into weak equivalence in detail and provide a complete
characterization of weakly equivalent MAP
2s. The analogous problem for the MAP
3 case is parti...
In this paper, we study how the lack of a unique representation for the two-state Markovian arrival process (MAP
2) can influence statistical estimation for the MAP
2/G/1 queueing system. In particular, given two equivalent representations of the same MAP
2, we find that the steady-state distributions of the queuing system can be identified.
The Markovian arrival process generalizes the Poisson process by allowing for
dependent and nonexponential interarrival times. We study the autocorrelation
function of the two-state Markovian arrival process. Our findings show that the
correlation structure of such a process has a very specific pattern, namely, it
always converges geometrically to...
The capability of modeling non-exponentially distributed and dependent inter-arrival times as well as correlated batches makes the Batch Markovian Arrival Processes (BMAP) suitable in different real-life settings as teletraffic, queueing theory or actuarial contexts. An issue to be taken into account for estimation purposes is the identifiability o...
The aim of this paper is to present results from a comparative investigation into the diagnostic performance of several wavelet-based estimators of scaling, some from published literature and some newly proposed. These estimators are evaluated based on their ability to classify digitized mammogram images from a clinical database, for which the true...
Many environmental time‐evolving spatial phenomena are characterized by a large number of energetic modes, the occurrence of irregularities, and self‐organization over a wide range of space or time scales. Precipitation is a classical example characterized by both strong intermittency and multiscale dynamics, and these features generate persistence...
A wavelet-based spectral method for estimating the (directional) Hurst parameter in isotropic and anisotropic non-stationary fractional Gaussian fields is proposed. The method can be applied to self-similar images and, in general, to d-dimensional data which scale. In the application part, the problems of denoising 2D fractional Brownian fields and...
A wavelet-based multifractal spectrum (MFS) for the analysis of images that possess an erratically changing oscillatory behavior at various scales is constructed and estimated. The methodology is applied to the analysis of mammograms. The key contribution is that the analysis is not focused on microcalcifications, but on the background of the image...
In this article we describe a method for carrying out Bayesian estimation for the double Pareto lognormal (dPlN) distribution which has been proposed as a model for heavy-tailed phenomena. We apply our approach to estimate the $\mathit{dPlN}/M/1$ and $M/\mathit{dPlN}/1$ queueing systems. These systems cannot be analyzed using standard techniques du...
In this paper we consider the problem of identifiability for the two-state
Markovian arrival process (MAP2). In particular, we show that the
MAP2 is not identifiable, providing the conditions under which two
different sets of parameters induce identical stationary laws for the
observable process.
In this paper we consider the problem of identifiability for the two-state Markovian arrival process (MAP 2 ). In particular, we show that the MAP 2 is not identifiable, providing the conditions under which two different sets of parameters induce identical stationary laws for the observable process.
In this paper we consider the estimation of a density function on the basis of a random stratified sample from weighted distributions. We propose a linear wavelet density estimator and prove its consistency. The behavior of the proposed estimator and its smoothed versions is eventually illustrated by simulated examples and a case study involving al...
In this paper we consider the problem of identifiability of the two-state Markovian Arrival process (MAP2). In particular, we show that the MAP2 is not identifiable and conditions are given under which two different sets of parameters, induce identical stationary laws for the observable process.
Internet traffic data is characterized by some unusual statistical properties, in particular, the presence of heavy-tailed variables. A typical model for heavy-tailed distributions is the Pareto distribution although this is not adequate in many cases. In this article, we consider a mixture of two-parameter Pareto distributions as a model for heavy...
Two types of transitions can be found in the Markovian Arrival process or MAP: with and without arrivals. In transient transitions the chain jumps from one state to another with no arrival; in effective transitions, a single arrival occurs. We assume that in practice, only arrival times are observed in a MAP. This leads us to define and study the E...
In this article we describe a method for carrying out Bayesian inference for the double Pareto lognormal (dPlN) distribution which has recently been proposed as a model for heavy-tailed phenomena. We apply our approach to inference for the dPlN/M/1 and M/dPlN/1 queueing systems. These systems cannot be analyzed using standard techniques due to the...
In this work, we discuss Bayesian estimation of multinomial probabilities associated with a finite alphabet A under incomplete experimental information. Two types of prior information are considered: (i) number of letters needed to see a particular pattern for the first time, and (ii) the fact that for two fixed words one appeared before the other.
Breast cancer is the second leading cause of death in women in the United States and at present, mammography is the only proven method that can detect minimal breast cancer. On the other hand, many medical images demonstrate a certain degree of self-similarity over a range of scales. The Multifractal spectrum (MFS) summarizes possibly variable degr...