Article

On the consistency of a spatial-type interval-valued median for random intervals

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The sample $d_\theta$-median is a robust estimator of the central tendency or location of an interval-valued random variable. While the interval-valued sample mean can be highly influenced by outliers, this spatial-type interval-valued median remains much more reliable. In this paper, we show that under general conditions the sample $d_\theta$-median is a strongly consistent estimator of the $d_\theta$-median of an interval-valued random variable.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The generalized metric includes as a special case the well-known Vitale L 2 metric [38] for interval values. The resulting estimator was first mentioned in [39] (a preliminary empirical study of its behavior) and its strong consistency was proven in [40]. One of the aims of this paper is to provide theoretical results for the existence and uniqueness of the spatial-type median. ...
... Inspired by the spatial median as extension of the median to higher dimensional Euclidean spaces and even Banach spaces [47], we now study a generalization of the spatial median to random intervals based on the L 2 metric d θ as in Sinova et al. [39,40]. We call this location measure the d θ -median to stress its dependence on the parameter involved in the metric. ...
... It is shown in [40] that for simple random samples, the sample d θ -median is a strongly consistent estimator of the d θ -median of a random interval under general conditions. However, theoretical results about the existence and uniqueness of the spatial-type intervalvalued median are still lacking. ...
Article
Full-text available
To estimate the central tendency or location of a sample of interval-valued data, a standard statistic is the interval-valued sample mean. Its strong sensitivity to outliers or data changes motivates the search for more robust alternatives. In this respect, a more robust location statistic is studied in this paper. This measure is inspired by the concept of spatial median and makes use of the versatile generalized Bertoluzza's metric between intervals, the so called d θ distance. The problem of minimizing the mean d θ distance to the values the random interval takes, which defines the spatial-type d θ-median, is analyzed. Existence and uniqueness of the sample version are shown. Furthermore, the robustness of this proposal is investigated by deriving its finite sample breakdown point. Finally, a real-life example from the Economics field illustrates the robustness of the sample d θ-median, and simulation studies show some comparisons with respect to the mean and several recently introduced robust location measures for interval-valued data.
Article
We observe every day a world more complex, uncertain, and riskier than the world of yesterday. Consequently, having accurate forecasts in economics, finance, energy, health, tourism, and so on; is more critical than ever. Moreover, there is an increasing requirement to provide other types of forecasts beyond point ones such as interval forecasts. After more than 50 years of research, there are two consensuses, "combining forecasts reduces the final forecasting error" and "a simple average of several forecasts often outperforms complicated weighting schemes", which was named "forecast combination puzzle (FCP)". The introduction of interval-valued time series (ITS) concepts and several forecasting methods has been proposed in different papers and gives answers to some big data challenges. Hence, one main issue is how to combine several forecasts obtained for one ITS. This paper proposes some combination schemes with a couple or various ITS forecasts. Some of them extend previous crisp combination schemes incorporating as a novelty the use of Theil's U. The FCP under the ITS forecasts framework will be analyzed in the context of different accuracy measures and some guidelines will be provided. An agenda for future research in the field of combining forecasts obtained for ITS will be outlined.
Chapter
Almost all experiments reveal variability of their results. In this contribution we consider the measures of dispersion for sample of random intervals. In particular, we suggest a generalization of two well-known classical measures of dispersion, i.e. the range and the interquartile range, for interval-valued samples.
Article
Full-text available
Among the new types of data emerging from real-life experiments, interval-valued ones are becoming very prevalent nowadays. In summarizing the location of interval-valued datasets, the Aumann mean is the most usual measure. This measure inherits almost all the nice properties of the mean value for real-valued datasets. Nevertheless, it also inherits a critical property, which is the one related to its high sensitivity to data changes or to the presence of outliers.As an approach to measure the location of interval-valued datasets in a more robust way, the notion of M-estimators will be considered. Two applications on chemical data will be included to motivate and illustrate the problem. Finally, an empirical comparative study will be conducted to show the performance of the different types of M-estimators proposed in this work.
Chapter
Full-text available
Since the Aumann-type expected value of a random interval is not robust, the aim of this paper is to propose a new central tendency measure for interval-valued data. The median of a random interval has already been defined as the interval minimizing the mean distance, in terms of an L 1 metric extending the Euclidean distance, to the values of the random interval. Inspired by the spatial median, we now follow a more common approach to define the median using an L 2 metric.
Article
Full-text available
The halfplane location depth of a point θ ∈ IR 2 relative to a bivariate data set X = {x 1, . . . , x n} is the minimal number of observations in any closed halfplane that contains θ (Tukey (1975)). The halfplane median or Tukey median is the θ with maximal depth k* (Donoho and Gasko (1992)). If this θ is not unique, the Tukey median is defined as the center of gravity of the set of points with depth k*. In this paper we construct two algorithms for computing the Tukey median. The first one is relatively straightforward but quite slow, whereas the second (called HALFMED) is much faster. A small simulation study is performed, and some examples are given.
Article
Full-text available
Multidimensional medians induced from depth functions as the generalizations of the univariate median have been proposed and studied. Like their univariate counterpart, they usually possess the desirable properties including affine equivariance, high breakdown point robustness, etc. Furthermore, they could serve as the deepest point (a location measure) of the underlying distribution. The most prominent and prevail depth median is Tukey’s halfspace median. However, like most of other depth medians, it is generally not unique. On the other hand, we show that the projection median distinguishes itself from its competitors and possesses the desirable uniqueness property.
Article
Full-text available
In quantifying the central tendency of the distribution of a random fuzzy number (or fuzzy random variable in Puri and Ralescu's sense), the most usual measure is the Aumann-type mean, which extends the mean of a real-valued random variable and preserves its main properties and behavior. Although such a behavior has very valuable and convenient implications, ‘extreme’ values or changes of data entail too much influence on the Aumann-type mean of a random fuzzy number. This strong influence motivates the search for a more robust central tendency measure. In this respect, this paper aims to explore the extension of the median to random fuzzy numbers. This extension is based on the 1-norm distance and its adequacy will be shown by analyzing its properties and comparing its robustness with that of the mean both theoretically and empirically.
Article
Full-text available
Let E be a separable Banach space, which is the dual of a Banach space F. If X is an E-valued random variable, the set of L1-medians of X is ArgminE[(d)]. Assume that this set contains only one element. From any sequence of probability measures {(d) 1} on E, which converges in law to X, we give two approximating sequences of the L1-median, for the weak* topology induced by F.
Article
Full-text available
Increasingly, datasets are so large they must be summarized in some fashion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that the data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. This article looks at the concept of symbolic data in general, and then attempts to review the methods currently available to analyze such data. It quickly becomes clear that the range of methodologies available draws analogies with developments before 1900 that formed a foundation for the inferential statistics of the 1900s, methods largely limited to small (by comparison) datasets and classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalog (so to speak) of new symbolic methodologies along with rigorous mathematical and statistical foundational work for these methods.
Article
Full-text available
A parametric modelling for interval data is proposed, assuming a multivariate Normal or Skew-Normal distribution for the midpoints and log-ranges of the interval variables. The intrinsic nature of the interval variables leads to special structures of the variance–covariance matrix, which is represented by five different possible configurations. Maximum likelihood estimation for both models under all considered configurations is studied. The proposed modelling is then considered in the context of analysis of variance and multivariate analysis of variance testing. To access the behaviour of the proposed methodology, a simulation study is performed. The results show that, for medium or large sample sizes, tests have good power and their true significance level approaches nominal levels when the constraints assumed for the model are respected; however, for small samples, sizes close to nominal levels cannot be guaranteed. Applications to Chinese meteorological data in three different regions and to credit card usage variables for different card designations, illustrate the proposed methodology.
Article
Full-text available
Many statistical data are imprecise due to factors such as measurement errors, computation errors, and lack of information. In such cases, data are better represented by intervals rather than by single numbers. Existing methods for analyzing interval-valued data include regressions in the metric space of intervals and symbolic data analysis, the latter being proposed in a more general setting. However, there has been a lack of literature on the distribution-based inferences for interval-valued data. In an attempt to fill this gap, we extend the concept of normality for random sets by Lyashenko (1983) and propose a normal hierarchical model for random intervals. In addition, we develop a minimum contrast estimator (MCE) for the model parameters, which we show is both consistent and asymptotically normal. Simulation studies support our theoretical findings, and show very promising results. Finally, we successfully apply our model and MCE to a real dataset.
Article
Full-text available
The ultimate goal of this paper is to determine a measure of the degree of dependence between two interval-valued random sets, when the dependence is intended in the sense of an affine function relating these random elements. For this purpose, a general study on the least squares fitting of an affine function for interval-valued data is first carried out, where the least squares method we will present considers that squared residuals are based on a generalized metric on the space of nonempty compact intervals, and output and input random mechanisms are modelled by means of convex compact random sets. For the general case of nondegenerate convex compact random sets, solutions are presented in an algorithmic way, and the few cases leading to nonunique solutions are characterized. On the basis of this regression study we later introduce and analyze a well-defined determination coefficient of two interval-valued random sets, which will allow us to quantify the strength of association between them, and an algorithm for the computation of the coefficient has been also designed. Finally, a real-life example illustrates the study developed in the paper.
Chapter
Full-text available
The aim of this paper is to extend the ideas of generalized additive models for multivariate data (with known or unknown link function) to functional data covariates. The proposed algorithmis a modified version of the local scoring and backfitting algorithms that allows for the non-parametric estimation of the link function. This algorithm would be applied to predict a binary response example.
Article
Full-text available
The L 1-median is a robust estimator of multivariate location with good statistical properties. Several algorithms for computing the L 1-median are available. Problem specific algorithms can be used, but also general optimization routines. The aim is to compare different algorithms with respect to their precision and runtime. This is possible because all considered algorithms have been implemented in a standardized manner in the open source environment R. In most situations, the algorithm based on the optimization routine NLM (non-linear minimization) clearly outperforms other approaches. Its low computation time makes applications for large and high-dimensional data feasible. KeywordsAlgorithm–Multivariate median–Optimization–Robustness
Article
Full-text available
Testing methods are introduced in order to determine whether there is some ‘linear’ relationship between imprecise predictor and response variables in a regression analysis. The variables are assumed to be interval-valued. Within this context, the variables are formalized as compact convex random sets, and an interval arithmetic-based linear model is considered. Then, a suitable equivalence for the hypothesis of linear independence in this model is obtained in terms of the mid-spread representations of the interval-valued variables. That is, in terms of some moments of random variables. Methods are constructed to test this equivalent hypothesis; in particular, the one based on bootstrap techniques will be applicable in a wide setting. The methodology is illustrated by means of a real-life example, and some simulation studies are considered to compare techniques in this framework.
Article
Full-text available
One of the most important aspects of the (statistical) analysis of imprecise data is the usage of a suitable distance on the family of all compact, convex fuzzy sets, which is not too hard to calculate and which reflects the intuitive meaning of fuzzy sets. On the basis of expressing the metric of Bertoluzza et al. [C. Bertoluzza, N. Corral, A. Salas, On a new class of distances between fuzzy numbers, Mathware Soft Comput. 2 (1995) 71–84] in terms of the mid points and spreads of the corresponding intervals we construct new families of metrics on the family of all d-dimensional compact convex sets as well as on the family of all d-dimensional compact convex fuzzy sets. It is shown that these metrics not only fulfill many good properties, but also that they are easy to calculate and easy to manage for statistical purposes, and therefore, useful from the practical point of view.
Article
Full-text available
Kernel Principal Component Analysis extends linear PCA from a Euclidean space to any reproducing kernel Hilbert space. Robustness issues for Kernel PCA are studied. The sensitivity of Kernel PCA to individual observations is characterized by calculating the influence function. A robust Kernel PCA method is proposed by incorporating kernels in the Spherical PCA algorithm. Using the scores from Spherical Kernel PCA, a graphical diagnostic is proposed to detect points that are influential for ordinary Kernel PCA.
Chapter
Full-text available
Interval-valued observations arise in several real-life situations, and it is convenient to develop statistical methods to deal with them. In the literature on Statistical Inference with single-valued observations one can find different studies on drawing conclusions about the population mean on the basis of the information supplied by the available observations. In this paper we present a bootstrap method of testing a ‘two-sided’ hypothesis about the (interval-valued) mean value of an interval-valued random set based on an extension of the t statistic for single-valued data. The method is illustrated by means of a real-life example.
Article
Full-text available
The aim of this paper is to investigate the economic specialization of the Italian local labor systems (sets of contiguous municipalities with a high degree of self-containment of daily commuter travel) by using the Symbolic Data approach, on the basis of data derived from the Census of Industrial and Service Activities. Specifically, the economic structure of a local labor system (LLS) is described by an interval-type variable, a special symbolic data type that allows for the fact that all municipalities within the same LLS do not have the same economic structure.
Article
Full-text available
The minimum covariance determinant (MCD) scatter estimator is a highly robust estimator for the dispersion matrix of a multivariate, elliptically symmetric distribution. It is relatively fast to compute and intuitively appealing. In this note we derive its influence function and compute the asymptotic variances of its elements. A comparison with the one step reweighted MCD and with S-estimators is made. Also finite-sample results are reported.
Article
Full-text available
The use of the fuzzy scale of measurement to describe an important number of observations from real-life attributes or variables is first explored. In contrast to other well-known scales (like nominal or ordinal), a wide class of statistical measures and techniques can be properly applied to analyze fuzzy data. This fact is connected with the possibility of identifying the scale with a special subset of a functional Hilbert space. The identification can be used to develop methods for the statistical analysis of fuzzy data by considering techniques in functional data analysis and vice versa. In this respect, an approach to the FANOVA test is presented and analyzed, and it is later particularized to deal with fuzzy data. The proposed approaches are illustrated by means of a real-life case study.
Article
Full-text available
The purpose of this note is to show that if a probability measure on a Euclidean space is not concentrated on a line, then its spatial median is unique.
Article
Full-text available
For a distribution $F$ on $\mathbb{R}^p$ and a point $x$ in $\mathbb{R}^p$, the simplical depth $D(x)$ is introduced, which is the probability that the point $x$ is contained inside a random simplex whose vertices are $p + 1$ independent observations from $F$. Mathematically and heuristically it is argued that $D(x)$ indeed can be viewed as a measure of depth of the point $x$ with respect to $F$. An empirical version of $D(\cdot)$ gives rise to a natural ordering of the data points from the center outward. The ordering thus obtained leads to the introduction of multivariate generalizations of the univariate sample median and $L$-statistics. This generalized sample median and $L$-statistics are affine equivariant.
Article
Full-text available
In the course of the studies on fuzzy regression analysis, we encountered the problem of introducing a distance between fuzzy numbers, which replaces the classical (x - y)2 on the real line. Our proposal is to compute such a function as a suitable weighted mean of the distances between the a-cuts of the fuzzy numbers. The main difficulty is concerned with the definition of the distance between intervals, since the current definitions present some disadvantages which are undesirable in our context. In this paper we describe an approach which removes such drawbacks.
Article
A new method of regression analysis for interval-valued data is proposed. The relationship between an interval-valued response variable and a set of interval-valued explanatory variables is investigated by considering two regression models, one for the midpoints and the other one for the radii. The estimation problem is approached by introducing Lasso-based constraints on the regression coefficients. This can improve the prediction accuracy of the model and, taking into account the nature of the constraints, can sometimes produce a parsimonious model with a common subset of regression coefficients for the midpoint and the radius models. The effectiveness of our method, called Lasso-IR (Lasso-based Interval-valued Regression), is shown by a simulation experiment and some applications to real data.
Article
In this paper, following a partitioning around medoids approach, a fuzzy clustering model for interval-valued data, i.e., FCMd-ID, is introduced. Successively, for avoiding the disruptive effects of possible outlier interval-valued data in the clustering process, a robust fuzzy clustering model with a trimming rule, called Trimmed Fuzzy \(C\) -medoids for interval-valued data (TrFCMd-ID), is proposed. In order to show the good performances of the robust clustering model, a simulation study and two applications are provided.
Article
Recently, kernel-based clustering in feature space has shown to perform better than conventional clustering methods in unsupervised classification. In this paper, a partitioning clustering method in kernel-induce feature space for symbolic interval-valued data is introduced. The distance between an item and its prototype in feature space is expanded using a two-component mixture kernel to handle intervals. Moreover, tools for the partition and cluster interpretation of interval-valued data in feature space are also presented. To show the effectiveness of the proposed method, experiments with real and synthetic interval data sets were performed and a study comparing the proposed method with different clustering algorithms of the literature is also presented. The clustering quality furnished by the methods is measured by an external cluster validity index (corrected Rand index). These experiments showed the usefulness of the kernel K-means method for interval-valued data and the merit of the partition and cluster interpretation tools.
Article
A new linear regression model for an interval-valued response and a real-valued explanatory variable is presented. The approach is based on the interval arithmetic. Comparisons with previous methods are discussed. The new linear model is theoretically analyzed and the regression parameters are estimated. Some properties of the regression estimators are investigated. Finally, the performance of the procedure is illustrated using both a real-life application and simulation studies.
Article
This paper considers several robust estimators for distribution functions and quantiles of a response variable when some responses may not be observed under the non-ignorable missing data mechanism. Based on a particular semiparametric regression model for non-ignorable missing response, we propose a nonparametric/semiparametric estimation method and an augmented inverse probability weighted imputation method to estimate the distribution function and quantiles of a response variable. Under some regularity conditions, we derive asymptotic properties of the proposed distribution function and quantile estimators. Two empirical log-likelihood functions are also defined to construct confidence intervals for distribution function of a response variable. Simulation studies show that our proposed methods are robust. In particular, the semiparametric estimator is more efficient than the nonparametric estimator, and the inverse probability weighted imputation estimator is bias-corrected. The Canadian Journal of Statistics 41: 575-595; 2013 (c) 2013 Statistical Society of Canada
Article
The first model-based clustering algorithm for multivariate functional data is proposed. After introducing multivariate functional principal components analysis (MFPCA), a parametric mixture model, based on the assumption of normality of the principal component scores, is defined and estimated by an EM-like algorithm. The main advantage of the proposed model is its ability to take into account the dependence among curves. Results on simulated and real datasets show the efficiency of the proposed method.
Article
In possibilistic clustering objects are assigned to clusters according to the so-called membership degrees taking values in the unit interval. Differently from fuzzy clustering, it is not required that the sum of the membership degrees of an object to all clusters is equal to one. This is very helpful in the presence of outliers, which are usually assigned to the clusters with membership degrees close to zero. Unfortunately, a drawback of the possibilistic approach is the tendency to produce coincident clusters. A remedy is to add a repulsion term among prototypes in the loss function forcing the prototypes to be far 'enough' from each other. Here, a possibilistic clustering algorithm with repulsion constraints for imprecise data, managed in terms of fuzzy sets, is introduced. Applications to synthetic and real fuzzy data are considered in order to analyze how the proposed clustering algorithm works in practice.
Article
This paper presents a robust regression model that deals with cases that have interval-valued outliers in the input data set. Each interval of the input data is represented by its range and midpoint and the fitting to interval-valued data is not sensible in the presence of midpoint and/or range outliers on the interval response. The predictions of the lower and upper bounds of new intervals are performed and simulation studies are carried out to validate these predictions. Two applications with real-life interval data sets are considered. The prediction quality is assessed by a mean magnitude of relative error calculated from a test data set.
Article
The spatial median, called the mediancentre in an earlier paper by J. C. Gower, is defined as the bivariate location measure to minimize the sum of absolute distances to observations. its asymptotic efficiency relative to the sample mean vector with normal data is shown to exceed the usual univariate 2/π = 0.637. its estimating equations have an angular aspect and are used to develop “angle tests”, which are analogues of sign tests in one dimension.
Article
This paper is an adaptation of symbolic interval Principal Component Analysis (PCA) to histogram data. We proposed two methodologies. The first one involved three steps: the coding of bins of histogram, the ordinary PCA of means of variables and the representation of dispersion of symbolic observations we call concepts. For the representation of dispersion of these concepts we proposed the transformation of histograms into intervals. Then, we suggest the projection of the hypercubes or the interval lengths associated to each concept on the principal axes of the ordinary PCA of means. In the second methodology, we proposed the use of the three previous steps with the angular transformation.
Article
Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.
Article
In a missing-data setting, we want to estimate the mean of a scalar outcome, based on a sample in which an explanatory variable is observed for every subject while responses are missing by happenstance for some of them. We consider two kinds of estimates of the mean response when the explanatory variable is functional. One is based on the average of the predicted values and the second one is a functional adaptation of the Horvitz–Thompson estimator. We show that the infinite dimensionality of the problem does not affect the rates of convergence by stating that the estimates are root-n consistent, under missing at random (MAR) assumption. These asymptotic features are completed by simulated experiments illustrating the easiness of implementation and the good behaviour on finite sample sizes of the method. This is the first paper emphasizing that the insensitiveness of averaged estimates, well known in multivariate non-parametric statistics, remains true for an infinite-dimensional covariable. In this sense, this work opens the way for various other results of this kind in functional data analysis.
Article
We introduce a new version of dynamic time warping for samples of observed event times that are modeled as time-warped intensity processes. Our approach is devel- oped within a framework where for each experimental unit or subject in a sample, one observes a random number of event times or random locations. As in our setting the number of observed events differs from subject to subject, usual landmark align- ment methods that require the number of events to be the same across subjects are not feasible. We address this challenge by applying dynamic time warping, initially by aligning the event times for pairs of subjects, regardless of whether the numbers of observed events within the considered pair of subjects match. The information about pairwise alignments is then combined to extract an overall alignment of the events for each subject across the entire sample. This overall alignment provides a useful description of event data and can be used as a pre-processing step for subse- quent analysis. The method is illustrated with a historical fertility study and with on-line auction data.
Article
Interval-valued variables have become very common in data analysis. Up until now, symbolic regression mostly approaches this type of data from an optimization point of view, considering neither the probabilistic aspects of the models nor the nonlinear relationships between the interval response and the interval predictors. In this article, we formulate interval-valued variables as bivariate random vectors and introduce the bivariate symbolic regression model based on the generalized linear models theory which provides much-needed exibility in practice. Important inferential aspects are investigated. Applications to synthetic and real data illustrate the usefulness of the proposed approach.
Article
In many situations, data follow a generalized linear model in which the mean of the responses is modelled, through a link function, linearly on the covariates. In this paper, robust estimators for the regression parameter are considered in order to build test statistics for this parameter when missing data occur in the responses. We derive the asymptotic behaviour of the robust estimators for the regression parameter under the null hypothesis and under contiguous alternatives in order to obtain that of the robust Wald test statistics. Their influence function is also studied. A simulation study allows to compare the behaviour of the classical and robust tests, under different contamination schemes. The procedure is also illustrated analysing a real data set.
Article
Principal Component Analysis (PCA) is a well-known technique, the aim of which is to synthesize huge amounts of numerical data by means of a low number of unobserved variables, called components. In this paper, an extension of PCA to deal with interval valued data is proposed. The method, called Midpoint Radius Principal Component Analysis (MR-PCA), recovers the underlying structure of interval valued data by using both the midpoints (or centers) and the radii (a measure of the interval width) information. In order to analyze how MR-PCA works, the results of a simulation study and two applications on chemical data are proposed.
Article
Simple and multiple linear regression models are considered between variables whose “values” are convex compact random sets in $${\mathbb{R}^p}$$ , (that is, hypercubes, spheres, and so on). We analyze such models within a set-arithmetic approach. Contrary to what happens for random variables, the least squares optimal solutions for the basic affine transformation model do not produce suitable estimates for the linear regression model. First, we derive least squares estimators for the simple linear regression model and examine them from a theoretical perspective. Moreover, the multiple linear regression model is dealt with and a stepwise algorithm is developed in order to find the estimates in this case. The particular problem of the linear regression with interval-valued data is also considered and illustrated by means of a real-life example.
Article
The aim of this paper is to cluster units (objects) described by interval-valued information by adopting an unsupervised neural network approach. By considering a suitable distance measure for interval data, self-organizing maps to deal with interval-valued data are suggested. The technique, called midpoint radius self-organizing maps (MR-SOMs), recovers the underlying structure of interval-valued data by using both the midpoints (or centers) and the radii (a measure of the interval width) information. In order to show how the method MR-SOMs works a suggestive application on telecommunication market segmentation is described.
Article
The Fuzzy k-Means clustering model (FkM) is a powerful tool for classifying objects into a set of k homogeneous clusters by means of the membership degrees of an object in a cluster. In FkM, for each object, the sum of the membership degrees in the clusters must be equal to one. Such a constraint may cause meaningless results, especially when noise is present. To avoid this drawback, it is possible to relax the constraint, leading to the so-called Possibilistic k-Means clustering model (PkM). In particular, attention is paid to the case in which the empirical information is affected by imprecision or vagueness. This is handled by means of LR fuzzy numbers. An FkM model for LR fuzzy data is firstly developed and a PkM model for the same type of data is then proposed. The results of a simulation experiment and of two applications to real world fuzzy data confirm the validity of both models, while providing indications as to some advantages connected with the use of the possibilistic approach.
Article
The estimation of a simple linear regression model when both the independent and dependent variable are interval valued is addressed. The regression model is defined by using the interval arithmetic, it considers the possibility of interval-valued disturbances, and it is less restrictive than existing models. After the theoretical formalization, the least-squares (LS) estimation of the linear model with respect to a suitable distance in the space of intervals is developed. The LS approach leads to a constrained minimization problem that is solved analytically. The strong consistency of the obtained estimators is proven. The estimation procedure is reinforced by a real-life application and some simulation studies.
Article
The article purports to present the most important studies on complications in port catheter systems from the past 10 years. This may be the reason for the error when the authors write that Huber needles, which are used for puncture through the port system’s silicone membrane, are non-punching. As early as in 1988, Haindl and Muller (1) as well as Muller and Zierski (2) were able to show that the Huber cannula, which was developed in the 1950s, releases silicone particles from the port septum. These particles pose a problem not only for port systems themselves, but also for patients as they may be able to reach their circulatory system. Our own studies into standard port cannulas, Huber cannulas, and punch-free cannulas showed that in 100 punctures with a Shore hardness of 80, large particles were punched by standard port cannulas, small particles by the Huber cannula, and 0 particles when using punch-free cannulas. Alternatives to the Huber cannula are available. The critical lower end of the Huber bevel has been modified to decreased sharpness (3). Other manufacturers have provided styles to protect the needle tip, for instance by using a mandrin, which is effective but costly. Another solution is a non-bending needle tip with a lateral orifice as found in punch-free needles. This is 100% effective in preventing punch defects regardless of Shore strength.
Article
Summary In this paper a robust fuzzy k-means clustering model for interval valued data is introduced. The peculiarity of the proposed model is the capability to manage anomalous interval valued data by reducing the effects of such outliers in the clustering model. In the interval case, the concept of anomalous data involves both the center and the width (the radius) of an interval. In order to show how our model works the results of a simulation experiment and an application to real interval valued data are discussed.