## About

260

Publications

42,260

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

7,929

Citations

## Publications

Publications (260)

For a set of $p$-variate data points $\boldsymbol y_1,\ldots,\boldsymbol y_n$, there are several versions of multivariate median and related multivariate sign test proposed and studied in the literature. In this paper we consider the asymptotic properties of the multivariate extension of the Hodges-Lehmann (HL) estimator, the spatial HL-estimator,...

In a seminal paper, Tyler (1987a) suggests an M-estimator for shape, which is now known as Tyler’s shape matrix. Tyler’s shape matrix is increasingly popular due to its nice statistical properties. It is distribution free within the class of generalized elliptical distributions. Further, under very mild regularity conditions, it is consistent and a...

An important challenge in big data is identification of important variables. For this purpose, methods of discovering variables with non-standard univariate marginal distributions are proposed. The conventional moments based summary statistics can be well-adopted, but their sensitivity to outliers can lead to selection based on a few outliers rathe...

Sliced inverse regression is one of the most popular sufficient dimension reduction methods. Originally, it was designed for independent and identically distributed data and recently extend to the case of serially and spatially dependent data. In this work we extend it to the case of spatially dependent data where the response might depend also on...

Dimension reduction is often a preliminary step in the analysis of data sets with a large number of variables. Most classical, both supervised and unsupervised, dimension reduction methods such as principal component analysis (PCA), independent component analysis (ICA) or sliced inverse regression (SIR) can be formulated using one, two or several d...

Sliced inverse regression is one of the most popular sufficient dimension reduction methods. Originally, it was designed for independent and identically distributed data and recently extend to the case of serially and spatially dependent data. In this work we extend it to the case of spatially dependent data where the response might depend also on...

Partial orderings and measures of information for continuous univariate random variables with special roles of Gaussian and uniform distributions are discussed. The information measures and measures of non-Gaussianity including third and fourth cumulants are generally used as projection indices in the projection pursuit approach for the independent...

We extend the theory of M-estimation to incomplete and dependent multivariate data. ML-estimation can still be considered a special case of M-estimation in this context. We notice that the unobserved data must be missing completely at random but not only missing at random, which is a typical assumption of ML-estimation, to guarantee the consistency...

An important challenge in big data is identification of important variables. In this paper, we propose methods of discovering variables with non-standard univariate marginal distributions. The conventional moments-based summary statistics can be well-adopted for that purpose, but their sensitivity to outliers can lead to selection based on a few ou...

Supervised dimension reduction for time series is challenging as there may be temporal dependence between the response y and the predictors x. Recently a time series version of sliced inverse regression, TSIR, was suggested, which applies approximate joint diagonalization of several supervised lagged covariance matrices to consider the temporal nat...

Supervised dimension reduction for time series is challenging as there may be temporal dependence between the response $y$ and the predictors $\boldsymbol x$. Recently a time series version of sliced inverse regression, TSIR, was suggested, which applies approximate joint diagonalization of several supervised lagged covariance matrices to consider...

Independent component analysis (ICA) is a data analysis tool that can be seen as a refinement of principal component analysis or factor analysis. ICA recovers the structures in the data which stay hidden if only the covariance matrix is used in the analysis. The ICA problem is formulated as a latent variable model where the observed variables are l...

In this article, we provide a personal review of the literature on nonparametric and robust tools in the standard univariate and multivariate location and scatter, as well as linear regression problems, with a special focus on sign and rank methods, their equivariance and invariance properties, and their robustness and efficiency. Beyond parametric...

We extend two methods of independent component analysis, fourth order blind identification and joint approximate diagonalization of eigen-matrices, to vector-valued functional data. Multivariate functional data occur naturally and frequently in modern applications, and extending independent component analysis to this setting allows us to distill im...

Consider a multivariate time series where each component series is assumed to be a linear mixture of latent mutually independent stationary time series. Classical independent component analysis (ICA) tools, such as fastICA, are often used to extract latent series, but they don't utilize any information on temporal dependence. Also financial time se...

A regression model where the response as well as the explaining variables are time series is considered. A general model which allows supervised dimension reduction in this context is suggested without considering the form of dependence. The method for this purpose combines ideas from sliced inverse regression (SIR) and blind source separation meth...

Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) p...

In this paper we study the theoretical properties of the deflation-based
FastICA method, the original symmetric FastICA method, and a modified symmetric
FastICA method, here called the squared symmetric FastICA. This modification is
obtained by replacing the absolute values in the FastICA objective function by
their squares. In the deflation-based...

Dimension reduction is often a preliminary step in the analysis of large data sets. The so-called non-Gaussian component analysis searches for a projection onto the non-Gaussian part of the data, and it is then important to know the correct dimension of the non-Gaussian signal subspace. In this paper we develop asymptotic as well as bootstrap tests...

In independent component analysis it is assumed that the observed random variables are linear combinations of latent, mutually independent random variables called the independent components. Our model further assumes that only the non-Gaussian independent components are of interest, the Gaussian components being treated as noise. In this paper proj...

Most linear dimension reduction methods proposed in the literature can be formulated using a suitable pair of scatter matrices, see e.g. Tyler et al. (2009), Bura and Yang (2011) and Liski et al. (2014). The eigenvalues of one scatter matrix with respect to another can be used to determine the dimensions of the signal and noise subspaces. In this p...

Dimensionality is a major concern in the analysis of large data sets. There are various well-known dimension reduction methods with different strengths and weaknesses. In practical situations it is difficult to decide which method to use as different methods emphasize different structures in the data. Like ensemble methods in statistical learning,...

This bookpresents material on both the analysis of the classical concepts of correlation and on the development of their robust versions, as well as discussing the related concepts of correlation matrices, partial correlation, canonical correlation, rank correlations, with the corresponding robust and non-robust estimation procedures. Every chapter...

This chapter examines several groups of robust estimates of the correlation coefficient under contaminated bivariate normal, independent component bivariate, and heavy-tailed Cauchy bivariate distributions with the use of influence function techniques in asymptotics and by the Monte Carlo method on finite samples. As a result of the comparative stu...

This chapter applies the highly robust and efficient estimates of scale and correlation, and presents the tools of exploratory data analysis. These applications comprise new versions of the boxplot techniques aimed at the visualization of both univariate and bivariate data and new methods and algorithms of detection of outliers in the data, also un...

This introductory chapter of “Robust Correlation: Theory and Applications” which is about correlation, association and partially about regression, i.e., about those areas of science where the dependencies between random variables that mathematically describe the relations between observed phenomena and associated with them features are studied. Evi...

This chapter presents the problem of robust estimation of scale is treated as the problem subordinate to robust estimation of correlation, as measures of correlation are defined via the measures of scale. Special attention is paid to the former robust Huber minimax variance and bias estimates of scale under e-contaminated normal distributions as we...

This chapter discusses the correlation measures and inference tools that are based on various robust covariance matrix functionals and estimates. It considers robust multivariate location and scatter functionals and estimates with a special focus on M-functionals. The chapter describes the robust versions of principal component analysis (PCA) and c...

This chapter considers the principal for all further constructions the problem of of location. It formulates the basic steps of the historically first Huber's minimax approach to robustness. The chapter outlines the concepts and techniques of the derivation of least informative distributions minimizing Fisher information for location, and presents...

This chapter discusses the correlation measures and inference tools that are based on various concepts of univariate and multivariate signs and ranks. It describes the three different multivariate extensions of the concepts of sign and rank and their use in multivariate correlation analysis. Classical multivariate statistical inference methods, nam...

This chapter provides an overview of classical multivariate correlation measures and inference tools based on the covariance and correlation matrix. It considers different parametric and semiparametric models for mutivariate continuous observations. Symmetry of a distribution of the standardized p-variate random variable Z may be seen as an invaria...

This chapter gives a brief review of robust hypothesis testing with the focus on Huber's minimax approach. It applies the results of robust minimax estimation of location to the problem of robust detection of a known signal. The chapter proposes to use redescending M-estimates of location including stable ones for robust detection of weak signals....

This chapter define several conventional measures of correlation, focuses mostly on Pearson's correlation coefficient and closely related to it constructions, enlist their principal properties and computational peculiarities. The author comments on the requirements that should be imposed on the measures of correlation to distinguish them from the m...

Conventional methods of power spectrum estimation are very sensitive to the presence of outliers in the data; thus generally the issues of robustness are of vital importance within this area of statistics of random processes. This chapter considers various robust versions of the conventional methods of power spectra estimation. It is found that the...

For parametric and nonparametric time series signal models and for two main classes of estimates of a power spectrum, the periodograms and Blackman-Tukey formula methods, the author proposes a number of robust estimates partially based on robust versions of the discrete Fourier transform and partially on robust estimates of the autocorrelation func...

We consider multivariate time series where each component series is an unknown linear combination of latent mutually independent stationary time series. Multivariate financial time series have often periods of low volatility followed by periods of high volatility. This kind of time series have typically non-Gaussian stationary distributions, and th...

In Hettmansperger and Randles (Biometrika 89:851–860, 2002) spatial
sign vectors were used to derive simultaneous estimators of multivariate location
and shape. Oja (Multivariate nonparametric methods with R. Springer, New York,
2010) proposed a similar approach for the multivariate linear regression case. These
estimators are highly robust and hav...

Independent component analysis is a standard tool in modern data analysis and numerous different techniques for applying it exist. The standard methods however quickly lose their effectiveness when the data are made up of structures of higher order than vectors, namely matrices or tensors (for example, images or videos), being unable to handle the...

In preprocessing tensor-valued data, e.g. images and videos, a common procedure is to vectorize the observations and subject the resulting vectors to one of the many methods used for independent component analysis (ICA). However, the tensor structure of the original data is lost in the vectorization and, as a more suitable alternative, we propose t...

We consider the problem of testing for multivariate independence in independent component (IC) models. Under a symmetry assumption, we develop parametric and nonparametric (signed-rank) tests. Unlike in independent component analysis (ICA), we allow for the singular cases involving more than one Gaussian independent component. The proposed rank tes...

Independent component analysis is a popular approach in search of latent variables and structures in high-dimensional data. We propose extensions of classical FOBI and JADE estimates for multivariate time series, with a special focus on time series with stochastic volatility.

In regional geochemistry rock, sediment, soil, plant or water samples, collected in a certain region, are analyzed for concentrations of chemical elements. The observations are thus usually high dimensional, spatially dependent and of compositional nature. In this paper, a novel blind source separation approach for spatially dependent data is sugge...

The interest in robust methods for blind source separation has increased recently. In this paper we shortly review what has been suggested so far for robustifying ICA and second order blind source separation. Furthermore do we suggest a new algorithm, eSAM-SOBI, which is an affine equivariant improvement of (already robust) SAM-SOBI. In a simulatio...

In the independent component analysis it is assumed that the components of
the observed random vector are linear combinations of latent independent
components, and the aim is then to estimate the linear transformations back to
independent components. Traditional methods to find estimates of an unmixing
matrix in engineering literature such as FOBI...

We present the R package gMWT which is designed for the comparison of several treatments (or groups) for a large number of variables. The comparisons are made using certain probabilistic indices (PI). The PIs computed here tell how often pairs or triples of observations coming from different groups appear in a specific order of magnitude. Classical...

Background:
Heritable factors are evidently involved in prostate cancer (PrCa) carcinogenesis, but currently, genetic markers are not routinely used in screening or diagnostics of the disease. More precise information is needed for making treatment decisions to distinguish aggressive cases from indolent disease, for which heritable factors could b...

The independent component model is a latent variable model where the
components of the observed random vector are linear combinations of latent
independent variables. The aim is to find an estimate for a transformation
matrix back to independent components. In moment-based approaches third
cumulants are often neglected in favor of fourth cumulants,...

Signals, recorded over time, are often observed as mixtures of multiple source signals. To extract relevant information from such measurements one needs to determine the mixing coefficients. In case of weakly stationary time series with uncorrelated source signals, this separation can be achieved by jointly diagonalizing sample autocovariances at d...

Myopia is a disorder of ocular refraction with varying rates of progression. Although the disorder has a dynamic nature, prospective longitudinal studies with long term follow-ups have been remarkably few. In this paper, we show how mixed-effects regression splines with different choices of basis functions can be used to model myopia progression da...

Deflation-based FastICA is a popular method for independent component analysis. In the standard deflation-based approach the row vectors of the unmixing matrix are extracted one after another always using the same nonlinearities. In practice the user has to choose the nonlinearities and the efficiency and robustness of the estimation procedure then...

New statistical procedures are introduced to analyse typical microRNA expression data sets. For each separate microRNA expression, the null hypothesis to be tested is that there is no difference between the distributions of the expression in different groups. The test statistics are then constructed having certain type of alternatives in mind. To a...

Dimension reduction plays an important role in high-dimensional data analysis. Principal component analysis, independent component analysis, and sliced inverse regression (SIR) are well known but very different analysis tools for the dimension reduction. It appears that these three approaches can all be seen as a comparison of two different scatter...

Blind source separation (BSS) is a signal processing tool, which is widely
used in various fields. Examples include biomedical signal separation, brain
imaging and economic time series applications. In BSS, one assumes that the
observed $p$ time series are linear combinations of $p$ latent uncorrelated
weakly stationary time series. The aim is then...

In this paper, we consider balanced hierarchical data designs for both one-sample and two-sample (two-treatment) location problems. The variances of the relevant estimates and the powers of the tests strongly depend on the data structure through the variance components at each hierarchical level. Also, the costs of a design may depend on the number...

In spite of recent contributions to the literature, informative cluster size settings are not well known and understood. In this paper, we give a formal definition of the problem and describe it from different viewpoints. Data generating mechanisms, parametric and nonparametric models are considered in light of examples. Our emphasis is on nonparam...

In this paper we assume that the observed pp time series are linear combinations of pp latent uncorrelated weakly stationary time series. The problem is then to find an estimate for an unmixing matrix that transforms the observed time series back to uncorrelated time series. The so called SOBI (Second Order Blind Identification) estimate aims at a...

Independent component analysis (ICA) is a widely used signal
processing tool having applications in various fields of science.
In this paper we focus on affine equivariant ICA methods.
Two such well-established estimation methods, FOBI
and JADE, diagonalize certain fourth order cumulant matrices
to extract the independent components. FOBI uses one...

Adherence is one of the most important determinants of viral suppression and drug resistance in HIV- infected people receiving antiretroviral therapy (ART).
We examined the association between long-term mortality and poor adherence to ART in DART trial participants in Uganda and Zimbabwe randomly assigned to receive laboratory and clinical monitor-...

Independent component models have gained increasing interest in various fields of applications in recent years. The basic independent component model is a semiparametric model assuming that a p-variate observed random vector is a linear transformation of an unobserved vector of p independent latent variables. This linear transformation is given by...

Multivariate medians are robust competitors of the mean vector in estimating the symmetry center of a multivariate distribution. Various definitions of the multivariate medians have been proposed in the literature, and their properties (efficiency, equivariance, robustness, computational convenience, estimation of their accuracy, etc.) have been ex...

Independent component analysis (ICA) has become a popular multivariate
analysis and signal processing technique with diverse applications. This paper
is targeted at discussing theoretical large sample properties of ICA unmixing
matrix functionals. We provide a formal definition of unmixing matrix
functional and consider two popular estimators in de...

A family of weighted rank tests and corresponding Hodges–Lehmann estimates are proposed for the analysis of multivariate two-sample clustered data. These procedures are a specific case of the nonparametric multivariate methods for clustered data considered by Nevalainen, Larocque, Oja, and Pörsti [(2010), ‘Nonparametric Analysis of Clustered Multiv...

In this short paper, we assume that the observed $p$ time series are
linear combinations of $p$ latent uncorrelated weakly stationary time series. The
problem is then, using the observed $p$-variate time series,
to find an estimate for a mixing or unmixing matrix for the combinations. The
estimated uncorrelated time series may then have nice inte...

In this review we discuss the six papers: Lehmann (1963) and Bickel and Lehmann (1974, 1975,1976a, 1976b, 1976c). The first paper deals with confidence intervals based on nonparametric tests, and the other papers discuss descriptive statistics for non12arametric models.

Dimensionality is a major concern in analyzing large data sets. Some well
known dimension reduction methods are for example principal component analysis
(PCA), invariant coordinate selection (ICS), sliced inverse regression (SIR),
sliced average variance estimate (SAVE), principal hessian directions (PHD) and
inverse regression estimator (IRE). How...

Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include si...

In this paper, we apply orthogonally equivariant spatial sign
covariance ma-
trices as well as their affine equivariant counterparts in prin
cipal compo-
nent analysis. The influence functions and asymptotic covarianc
e matrices
of eigenvectors based on robust covariance estimators are deri
ved to com-
pare the robustness and efficiency properties....

Les problèmes d'équivariance et d'invariance sont fréquents en analyse multivariée, o des adaptations sont parfois nécessaires en vue de l'obtention de versions équivariantes ou invariantes des procédures utilisées. Ces adaptations reposent souvent sur un traitement préliminaire des données, standardisation ou transformation via un système de coord...

Objectives:
To describe associations between different summaries of adherence in the first year on antiretroviral therapy (ART) and the subsequent risk of mortality, to identify patients at high risk because of early adherence behaviour.
Methods:
We previously described an approach where adherence behaviour at successive clinic visits during the...

Procedures such as FOBI that jointly diagonalize two matrices with the independence property have a long tradition in ICA. These procedures have well-known statistical properties, for example they are prone to failure if the sources have multiple identical values on the diagonal. In this paper we suggest to diagonalize jointly k≥2 scatter matrices...

Several predisposition loci for hereditary prostate cancer (HPC) have been suggested, including HPCX1 at Xq27-q28, but due to the complex structure of the region, the susceptibility gene has not yet been identified.
In this study, nonsense-mediated mRNA decay (NMD) inhibition was used for the discovery of truncating mutations. Six prostate cancer (...

In the paper we present an R package MNM dedicated to multivariate data analysis based on the L_1 norm. The analysis proceeds very much as does a traditional multivariate analysis. The regular L_2 norm is just replaced by different L_1 norms, observation vectors are replaced by their (standardized and centered) spatial signs, spatial ranks, and spa...

The data collected in epidemiological or clinical studies are frequently clustered. In such settings, appropriate variance adjustments must be made in order to estimate the sufficient sample size correctly. This paper works through the sample size calculations for clustered data. Importantly, our explicit variance expressions also enable us to opti...

The population attributable fraction (PAF) is a useful measure for quantifying the impact of exposure to certain risk factors on a particular outcome at the population level. Recently, new model-based methods for the estimation of PAF and its confidence interval for different types of outcomes in a cohort study design have been proposed. In this pa...

In the paper we present an R package MNM dedicated to multivariate data analysis based on the L 1 norm. The analysis proceeds very much as does a traditional multivariate analysis. The regular L 2 norm is just replaced by different L 1 norms, observation vectors are replaced by their (standardized and centered) spatial signs, spatial ranks, and spa...

For assessing the separation performance (quality and accuracy) of ICA estimators, several performance indices have been introduced in the literature. The purpose of this note is to outline, review and study the properties of performance indices as well as propose some new ones. Special emphasis is put on the properties that such performance indice...

Invariant coordinate selection (ICS) has recently been introduced by Tyler et al. (2009) as a method for exploring multivariate data. It includes as shown in Oja et al. (2006) as a special case a method for recovering the unmixing matrix in independent components analysis (ICA). It also serves as a basis for classes of multivariate nonparametric te...

In the independent component (IC) model it is assumed that X = (x1;:::;xn) is a random sample such that xi = zi, i = 1;:::;n, where zi is a random vector with independent components and is the so called mixing matrix. In the independent component analysis (ICA) one then tries to estimate an unmixing matrix such that xi have independent components....

Most traditional statistical methods assume the independence of observations. However, in practical data observations often come in clusters on one or more nested levels. Observations may be correlated within clusters. Disregarding the structure of the data may lead to erroneous inference. A general family of weighted one-sample location test stati...

Adherence to a medical treatment means the extent to which a patient follows the instructions or recommendations by health professionals. There are direct and indirect ways to measure adherence which have been used for clinical management and research. Typically adherence measures are monitored over a long follow-up or treatment period, and some me...

miRNAs have proven to be key regulators of gene expression and are differentially expressed in various diseases, including cancer. Our aim was to identify epigenetically dysregulated genes in prostate cancer. We performed miRNA expression profiling after relieving epigenetic modifications in six prostate cancer cell lines and non-malignant prostate...

In independent subspace analysis (ISA) one assumes that the com-ponents of the observed random vector are linear combinations of the compo-nents of a latent random vector with independent subvectors. The problem is then to find an estimate of a transformation matrix to recover the independent subvectors. Regular independent component analysis (ICA)...

In independent component analysis (ICA) it is often assumed that the p components of the observation vector are linear combinations of p underlying independent components. Two scatter matrices having the so called independence property can then be used to recover the independent components. The assumption of (exactly) p independent components is ho...

Deflation-based FastICA, where independent components (IC's) are extracted one-by-one, is among the most popular methods for estimating an unmixing matrix in the indepen-dent component analysis (ICA) model. In the literature, it is often seen rather as an algorithm than an estimator related to a certain objective function, and only recently has its...

Several extensions of the multivariate normal model have been shown to be useful in practical data analysis. Therefore, tools to identify which model might be appropriate for the analysis of a real data set are needed. This paper suggests the simultaneous use of two location and two scatter functionals to obtain multivariate descriptive measures fo...

We consider a semiparametric multivariate location-scatter model where the standardized random vector of the model is fixed using simultaneously two location vectors and two scatter matrices. The approach using location and scatter functionals based on the first four moments serves as our main example. The four functionals yield in a natural way th...

In this paper, the shape matrix estimators based on spatial sign and rank vectors are considered. The estimators considered here are slight modifications of the estimators introduced in Dümbgen (1998) and Oja and Randles (2004) and further studied for example in Sirkiä et al. (2009). The shape estimators are computed using pairwise differences of t...