Kanti MardiaUniversity of Leeds · Department of Statistics
Kanti Mardia
About
387
Publications
31,611
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
34,328
Citations
Introduction
His research contributions span over many areas including the following:
Bioinformatics, directional statistics, geosciences, image analysis, multivariate analysis, shape analysis, spatial statistics, and spatial temporal modelling; see some of the highlights of research material within these areas. He has been a major force in creating the subject of Statistics of Manifolds (Directional Statistics, Shape Analysis in particular) and has been leading the sub
Publications
Publications (387)
Many biological objects possess bilateral symmetry about a midline or midplane, up to a ``noise'' term. This paper uses landmark-based methods to measure departures from bilateral symmetry, especially for the two-group problem where one group is more asymmetric than the other. In this paper, we formulate our work in the framework of size-and-shape...
It will not be an exaggeration to say that R A Fisher is the Albert Einstein of Statistics. He pioneered almost all the main branches of statistics, but it is not as well known that he opened the area of Directional Statistics with his 1953 paper introducing a distribution on the sphere which is now known as the Fisher distribution. He stressed tha...
The method of Cartesian transformations introduced by D'Arcy Thompson a century ago in his celebrated book On Growth and Form precipitated an important development in 20th-century biometrics: a fusion of the geometrical and biological approaches to morphology. Some decades later this fusion, in turn, spun off another multidisciplinary focus, statis...
Three-dimensional RNA structures frequently contain atomic clashes. Usually, corrections approximate the biophysical chemistry, which is computationally intensive and often does not correct all clashes. We propose fast, data-driven reconstructions from clash-free benchmark data with two-scale shape analysis: microscopic (suites) dihedral backbone a...
We give a unified treatment of constructing families of circular discrete distributions. Some of these families are deduced from established distributions such as von Mises and wrapped Cauchy. Some others are derived directly such as a flexible family based on trigonometric sums and the circular location family. Results interrelating these families...
Finite mixture models are fitted to spherical data. Kent distributions are used for the components of the mixture because they allow considerable flexibility. Previous work on such mixtures has used an approximate maximum likelihood estimator for the parameters of a single component. However, the approximation causes problems when using the EM algo...
The simplest way to construct a non‐Gaussian random field is to transform a Gaussian random field. This chapter looks at log‐normal random fields. It discusses modeling strategies to deal with nonnormal data by relating the data to observations from a Gaussian process. The drift function in latent Gaussian spatial process can be thought of as the “...
This chapter looks at random fields that have a natural description in terms of a probabilistic model, typically an autoregression. The symmetric two‐sided version of the autoregression process is known as a “simultaneous autoregression” (SAR) and is extended to higher dimensions. The autoregression model can be given either a “unilateral” conditio...
This chapter considers the question of prediction for random fields. Prediction in a spatial context is known as “kriging” in the geostatistics literature. The simple kriging predictor can be interpreted as a posterior mean in a Bayesian analysis. The chapter details some artificial examples in one dimension to illustrate different aspects of krigi...
An intrinsic random field can be described as a random field with stationary increments. In a generalized random field, the realizations are too rough to be ordinary functions. The spectral representation for the covariance function of a stationary random field can be extended to cover random fields that are intrinsic or generalized or both. This c...
This chapter focuses on the problem of fitting a stationary or intrinsic Gaussian model to a set of spatial data. It examines the types of behavior for a one‐dimensional semivariogram and then looks at how one‐dimensional semivariograms can fit together in higher dimensional cases. Models in spatial analysis can be specified in two common ways: dir...
This chapter investigates the problem of fitting a stationary autoregression Gaussian model. The autoregression models include unilateral autoregressions (UARs), conditional autoregressions (CARs), and simultaneous autoregressions (SARs). The chapter also looks at the AR(1) process in one dimension in detail. It shows how moment equations can be de...
Big data, high dimensional data, sparse data, large scale data, and imaging data are all becoming new frontiers of statistics. Changing technologies have created this flood and have led to a real hunger for new modelling strategies and data analysis by scientists. In many cases data are not Euclidean; for example, in molecular biology, the data sit...
Motivation
Reconstructions of structure of biomolecules, for instance via X-ray crystallography or cryo-EM frequently contain clashes of atomic centers. Correction methods are usually based on simulations approximating biophysical chemistry, making them computationally expensive and often not correcting all clashes.
Results
We propose a computatio...
Finite mixture models are fitted to spherical data. Kent distributions are used for the components of the mixture because they allow considerable flexibility. Previous work on such mixtures has used an approximate maximum likelihood estimator for the parameters of a single component. However, the approximation causes problems when using the EM algo...
Molecular structures of RNA molecules reconstructed from X-ray crystallography frequently contain errors. Motivated by this problem we examine clustering on a torus since RNA shapes can be described by dihedral angles. A previously developed clustering method for torus data involves two tuning parameters and we assess clustering results for differe...
A new two-parameter “full exponential cardioid” radial growth model for two-dimensional geometric objects is proposed and analyzed. The model depends additionally on two rotation parameters and on two seeds about which the growth is centered, plus a choice of three possible assumptions about statistical errors. If the seeds are assumed known, the r...
Motivated by some cutting edge circular data such as from Smart Home technologies and roulette spins from online and casino, we construct some new rich classes of discrete distributions on the circle. We give four new general methods of construction, namely (i) maximum entropy, (ii) centered wrapping, (iii) marginalized and (iv) conditionalized met...
Consider a helix in three-dimensional space along which a sequence of equally spaced points is observed, subject to statistical noise. For data coming from a single helix, a two-stage algorithm based on a profile likelihood is developed to compute the maximum likelihood estimate of the helix parameters. Statistical properties of the estimator are s...
For noisy two-dimensional data, which are approximately uniformly distributed near the circumference of an ellipse, Mardia and Holmes (1980) developed a model to fit the ellipse. In this paper we adapt their methodology to the analysis of helix data in three dimensions. If the helix axis is known, then the Mardia-Holmes model for the circular case...
The need for effective simulation methods for directional distributions has grown as they have become components in more sophisticated statistical models. A new acceptance-rejection method is proposed and investigated for the Bingham distribution on the sphere using the angular central Gaussian distribution as an envelope. It is shown that the prop...
A century ago, a Scottish biologist published ideas that would be built on generations later by those working in the field of statistical shape analysis. Kanti V. Mardia, Fred L. Bookstein, Balvinder S. Khambay and John T. Kent celebrate D'Arcy Thompson's most famous work on the 70th anniversary of his death A century ago, a Scottish biologist publ...
Various practical situations give rise to observations that are directions, and this has led to the field of directional statistics. Especially in recent years, new applications of this area have emerged, such as in structural bioinformatics, machine learning and cosmology. Consequently, various new directional distributions have appeared in the li...
This chapter shows how toroidal diffusions are convenient methodological tools for modelling protein evolution in a probabilistic framework. The chapter addresses the construction of ergodic diffusions with stationary distributions equal to well-known directional distributions, which can be regarded as toroidal analogues of the Ornstein-Uhlenbeck p...
Motivated by a cutting edge problem related to the shape of
α
-helices in proteins, we formulate a parametric statistical model, which incorporates the cylindrical nature of the helix. Our focus is to detect a "kink," which is a drastic change in the axial direction of the helix. We propose a statistical model for the straight
α
-helix and derive...
We introduce stochastic models for continuous-time evolution of angles and develop their estimation. We focus on studying Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics, since such diffusions can be regarded as toroidal analogues of the Ornstein-Uhlenbeck process. Their likelihood fun...
There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method that adaptively favors low-dimensional representations while preventing overfitting by a new test, both of which can be generally applied and address shortcomings in two previously proposed PCA methods: Unlike tangent space PCA,...
Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model co...
A thoroughly revised and updated edition of thisintroduction to modern statistical methods for shape analysis Shape analysis is an important tool in the many disciplines where objects are compared using geometrical features. Examples include comparing brain shape in schizophrenia; investigating protein molecules in bioinformatics; and describing gr...
This chapter contains some extensions to shape analysis, especially applications to object data and more general manifolds. The techniques of shape analysis have natural extensions to other application areas. The very broad field of object oriented data analysis (OODA) is concerned with analysing different types of data objects compared with the co...
The concept of a mean shape or mean size-and-shape has an underpinning role to statistical shape analysis, for example when choosing a tangent space projection. For more general manifolds means we have many choices. The first distinction to make is that of an intrinsic mean versus an extrinsic mean, which are defined using either an intrinsic or ex...
Of fundamental interest are probability distributions in shape spaces and pre-shape spaces, which provide models for statistical shape analysis. This chapter considers the joint distribution of size and shape. The work in it is primarily for m = 2 dimensional landmarks, and some extensions to higher dimensions will be considered. The main inference...
Rather than conditioning one could consider the marginal distribution of shape after integrating out the similarity transformations. These distributions will be called offset shape distributions. Shape distributions for the general multivariate normal model with a general covariance structure in m = 2 dimensions have been studied by Dryden and Mard...
One of the major problems for maximum likelihood estimation in the well-established directional models is that the normalising constants can be difficult to evaluate. A new general method of "score matching estimation" is presented here on a compact oriented Riemannian manifold. Important applications include von Mises-Fisher, Bingham and joint mod...
Alan Turing (1912–1954) made seminal contributions to mathematical logic, computation, computer science, artificial intelligence, cryptography and theoretical biology. In this volume, outstanding scientific thinkers take a fresh look at the great range of Turing's contributions, on how the subjects have developed since his time, and how they might...
Applications of circular regression models appear in many different fields such as evolutionary psychology, motor behavior, biology, and, in particular, in the analysis of gene expressions in oscillatory systems. Specifically, for the gene expression problem, a researcher may be interested in modeling the relationship among the phases of cell-cycle...
There are several cutting edge applications needing PCA methods for data on
tori and we propose a novel torus-PCA method with important properties that can
be generally applied. There are two existing general methods: tangent space PCA
and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the
cyclic topology of the data space wh...
In the last two decades, there has been an increase in the availability of large data sets in geosciences (e.g. data provided by satellites, large data sets collected with modern geophysical techniques, exhaustive sampling campaigns or the increase of historical data like climatological data). One challenging problem for such large data in spatial...
One of the major problems in biology is related to protein folding. The
folding process is known to depend on both the protein's sequence (1-D) and
structure (3-D). Both these properties need to be considered when aligning two
proteins; they are also influenced by the evolutionary distance between the
proteins to be aligned. We propose a Bayesian m...
Motivated by molecular biology, there has been an upsurge of research
activities in directional statistics in general and its Bayesian aspect in
particular. The central distribution for the circular case is von Mises
distribution which has two parameters (mean and concentration) akin to the
univariate normal distribution. However, there has been a...
We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain...
We present the theoretical foundations of a general principle to infer structure ensembles of flexible biomolecules from spatially and temporally averaged data obtained in biophysical experiments. The central idea is to compute the Kullback-Leibler optimal modification of a given prior distribution [Formula: see text] with respect to the experiment...
A new acceptance-rejection method is proposed and investigated for the
Bingham distribution on the sphere using the angular central Gaussian
distribution as an envelope. It is shown to have high efficiency and to be
straightfoward to use. The method can also be extended to Fisher and
Fisher-Bingham distributions on spheres and related manifolds.
We develop a Bayesian model for the alignment of two point configurations under the full similarity transformations of rotation, translation and scaling. Other work in this area has concentrated on rigid body transformations, where scale information is preserved, motivated by problems involving molecular data; this is known as form analysis. We con...
Alcohol can damage the brains of unborn babies. Shape analysis can assess the damage in fetal alcohol spectrum disorders. Kanti Mardia, Fred Bookstein and John Kent explain how it works, and how it can help babies and even murderers.
. Proteins are the workhorses of all living systems, and protein bioinformatics deals with analysis of protein sequences (one dimensional) and structures (three dimensional). The paper reviews statistical advances in three major active areas of protein structural bioinformatics: structure comparison, Ramachandran plots and structure prediction. The...
Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from...
Measuring the quality of determined protein structures is a very important problem in bioinformatics. Kernel density estimation is a well-known nonparametric method which is often used for exploratory data analysis. Recent advances, which have extended previous linear methods to multi-dimensional circular data, give a sound basis for the analysis o...
Motivated by examples in protein bioinformatics, we study a mixture model of multivariate angular distributions. The distribution treated here (multivariate sine distribution) is a multivariate extension of the well-known von Mises distribution on the circle. The density of the sine distribution has an intractable normalizing constant and here we p...
Unlabeled shape analysis is a rapidly emerging and challenging area of
statistics. This has been driven by various novel applications in
bioinformatics. We consider here the situation where two configurations are
matched under various constraints, namely, the configurations have a subset of
manually located "markers" with high probability of matchi...
The recently introduced reference ratio method[276] allows combining distributions over fine-grained variables with distributions over coarse-grained variables in a meaningful way. This problem is a major bottleneck in the prediction, simulation and design of protein structure and dynamics. Hamelryck et al. [276] introduced the reference ratio meth...
Circular data arises in many areas of science, including astronomy, biology, physics, earth science and meteorology. In molecular biology, circular data emerges particularly in the study of macromolecules. One of the classical examples in this field is the Ramachandran map [595], which describes dihedral angles in the protein main chain.
This chapter considers the problem of matching configurations of biological macromolecules when both alignment and superposition transformations are unknown. Alignment denotes correspondence – a bijection or mapping – between points in different structures according to some objectives or constraints. Superposition denotes rigid-body transformations...
In application areas like bioinformatics, multivariate distributions on angles are encountered which show significant clustering. One approach to statistical modeling of such situations is to use mixtures of unimodal distributions. In the literature (Mardia et al., 20129.
Mardia , K. V. ,
Kent , J. T. ,
Zhang , Z. ,
Taylor , C. ,
Hamelryck , T...
One of the key ingredients in drug discovery is the derivation of conceptual templates called pharmacophores. A pharmacophore model characterizes the physicochemical properties common to all active molecules, called ligands, bound to a particular protein receptor, together with their relative spatial arrangement. Motivated by this important applica...
It has long been known that the amino-acid sequence of a protein determines its 3-dimensional structure, but accurate ab initio prediction of structure from sequence remains elusive. We gain insight into local protein structure conformation by studying the relationship of dihedral angles in pairs of residues in protein sequences (dipeptides). We ad...
This paper highlights distributional connections between directional statistics and shape analysis. In particular, we provide a test of uniformity for highly dispersed shapes, using the standard techniques of directional statistics. We exploit the isometric transformation from triangular shapes to a sphere in three dimensions, to provide a rich cla...
In certain abnormalities of spinal shape, the long axis of the spine moves out of the median sagittal plane, producing both an axial torsion and a lateral deviation out of the usual front-back plane. Clinicians need to be able to assess rapidly and accurately whether or not an individual has such an abnormality. In this paper, we examine several ca...
There has been renewed interest in the directional Bayesian analysis for the bivariate case especially in view of its fundamental new and challenging applications to bioinformatics. The previous work had concentrated on Bayesian analysis for univariate von Mises distribution. Here, we give the description of the general bivariate von Mises (BVM) di...
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same par...
This paper describes a numerical data set comprising the three-dimensional fracture network of a 1 m3 block of Bodmin granite. The data set can be downloaded from the following web sites: www.elsevier.com/ijrmms, www.leeds.ac.uk/StochasticRockFractures and www.ecms.adelaide.edu.au/civeng/research/StochasticRockFractures. The purpose of this paper i...
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. He...
The KL divergences for angle pairs.
(0.02 MB PDF)
The 5% and 25% quantiles of the RMSD distributions for decoys with correct base pairing.
(0.02 MB PDF)
Histograms of pairwise angle distributions with the highest and lowest KL difference.
(0.49 MB PDF)
Execution time of the MCMC algorithm.
(0.02 MB PDF)
The marginal distributions of all seven individual angles.
(0.06 MB PDF)
The KL divergences for the seven individual angles.
(0.01 MB PDF)
The Matérn covariance scheme is of great importance in many geostatistical applications where the smoothness or differentiability of the random field that models a natural phenomenon is of interest. In addition to the range and nugget parameters, the flexibility of the Matérn model is provided by the so-called smoothness parameter which controls th...
Roy's theorem on Jacobians under constraints is applied to obtain an explicit form of the Haar measure. This is a key result for the matrix Fisher distribution on rotations. The distributions have recently appeared for some problems in Bioinformatics.
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. He...
We propose a simple procedure for generating virtual protein C(alpha) traces. One of the key ingredients of our method, to build a three-dimensional structure from a random sequence of amino acids, is to work directly on torsional angles of the chain which we sample from a von Mises distribution. With simple modeling of the hydrophobic effect in pr...
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a full...
IntroductionTests of SymmetryTwo-Sample TestsMulti-Sample Tests
IntroductionThe Distribution FunctoinThe Characteristic FunctionMoments and Measures of Location and Dispersion.Circular ModelsMultiply-Wrapped DistributionsDistributions on the Torus and the Cylinder
Introductionsingle-sample teststwo-sample testsmulti-sample teststesting von misesness
IntroductionExploratory Data AnalysisPoint EstimationSingle-Sample TestsTWO-Sample TestsMulti-Sample TestsTests on Axial DistributioxsA General Framework for Testing Uniformity
IntroductionUnbiased Estimatiors and a Cramér-Rao BoundVon Mises DistributionsWrapped Cauchy DistributionsMixtures of von Mises Distributions
IntroductionGraphical Assessment of UniformityTests of UniformityTests of Goodness-of-Fit
Proceedings from a Symposium, of the same name, held on the campus of Syracuse University from March to June, 1989.
Chapters by the following authors: L. Anselin, P. Doreian, D. A. Griffith, R. P. Haining, K. V. Mardia, R. J. Martin, J. K. Ord, J. H. P. Paelinck, S. Richardson, B. D. Ripley, A. Sen, G. J. G. Upton, D. Wartenberg.
Motivated by problems of modelling torsional angles in molecules, Singh, Hnizdo & Demchuk (2002) proposed a bivariate circular model which is a natural torus analogue of the bivariate normal distribution and a natural extension of the univariate von Mises distribution to the bivariate case. The authors present here a multivariate extension of the b...
I describe here my connection with two of the major contributions of S.N. Roy: namely the Jacobians of com-plicated transformations for various exact distributions, rectangular coordinates and the Bartlett decomposition. Their applications have appeared in directional statis-tics, shape analysis and now in statistical bioinformat-ics.