ArticlePDF Available

Typical Yet Unlikely and Normally Abnormal: The Intuition Behind High-Dimensional Statistics

Authors:

Abstract and Figures

Normality, in the colloquial sense, has historically been considered an aspirational trait, synonymous with ideality. The arithmetic average and, by extension, statistics including linear regression coefficients, have often been used to characterize normality, and are often used as a way to summarize samples and identify outliers. We provide intuition behind the behavior of such statistics in high dimensions, and demonstrate that even for datasets with a relatively low number of dimensions, data start to exhibit a number of peculiarities which become severe as the number of dimensions increases. Whilst our main goal is to familiarize researchers with these peculiarities, we also show that normality can be better characterized with ‘typicality’, an information theoretic concept relating to entropy. An application of typicality to both synthetic and real-world data concerning political values reveals that in multi-dimensional space, to be ‘normal’ is actually to be atypical. We briefly explore the ramifications for outlier detection, demonstrating how typicality, in contrast with the popular Mahalanobis distance, represents a viable method for outlier detection.
This content is subject to copyright. Terms and conditions apply.
Article
Matthew J. Vowels*
Typical Yet Unlikely and Normally Abnormal:
The Intuition Behind High-Dimensional
Statistics
https://doi.org/10.1515/spp-2023-0028
Received August 10, 2023; accepted November 27, 2023; published online December 26, 2023
Abstract: Normality, in the colloquial sense, has historically been considered an
aspirational trait, synonymous with ideality. The arithmetic average and, by
extension, statistics including linear regression coecients, have often been used to
characterize normality, and are often used as a way to summarize samples and
identify outliers. We provide intuition behind the behavior of such statistics in high
dimensions, and demonstrate that even for datasets with a relatively low number of
dimensions, data start to exhibit a number of peculiarities which become severe as
the number of dimensions increases. Whilst our main goal is to familiarize re-
searchers with these peculiarities, we also show that normality can be better char-
acterized with typicality, an information theoretic concept relating to entropy. An
application of typicality to both synthetic and real-world data concerning political
values reveals that in multi-dimensional space, to be normalis actually to be
atypical. We briey explore the ramications for outlier detection, demonstrating
how typicality, in contrast with the popular Mahalanobis distance, represents a
viable method for outlier detection.
Keywords: statistics; information theory; outlier; normality; typicality
In a well known United States Air Force (USAF) experiment seeking to identify the
average man, Gilbert Daniels found that out of 4063 men, not a single one fell within
30 % of the arithmetic sample averages for each of ten physical dimensions simul-
taneously (which included attributes such as stature, sleeve length, thigh circum-
ference, and so on) (Daniels 1952). Rather than this being an unlikely uke of the
sample, averages, rather than representing the most normalattributes in a sample,
are actually highly abnormal in the context of multi-dimensional data. Indeed, even
*Corresponding author: Matthew J. Vowels, Institute of Psychology, University of Lausanne, Lausanne,
Switzerland, E-mail: matthew.vowels@unil.ch
Stat Polit Pol 2024; 15(1): 87113
Open Access. © 2023 the author(s), published by De Gruyter. This work is licensed under the
Creative Commons Attribution 4.0 International License.
though averages may provide seemingly useful baselines for comparison, it is
important and perhaps surprising to note that the chance of nding an individual
with multiple traits falling close to the average is vanishingly small, particularly as
the number of traits increases.
The arithmetic average has been used to represent normality (vis-a-vis abnor-
mality, in the informal/colloquial/non-statistical sense), and is often used both pro-
ductively and unproductively as a blunt way to characterize samples and outliers.
Prior commentary has highlighted the pitfalls associated with the use of the mean as
a summary statistic (Speelman and McGann 2013); the limitations in relation to its
applicability and usefulness of parametric representations (such as the Gaussian)
when dealing with real-world phenomena (Micceri 1989; Modis 2007); and the soci-
etal context (Comte 1976; Misztal 2002), surrounding the potentially harmful
perception of normality as a gure of perfection to which we may progress
(Hacking 1990, 168). Whilst these commentaries are valuable and important in
developing an awareness for what it means to use averages to characterize hu-
mankind, they do not provide us with an alternative. They also do not discuss some of
the more technical aspects of normality in the context of multiple dimensions and
outlier detection, or explain why normality, when characterized by the arithmetic
average, is so dicult to attain in principle.
1
Furthermore, the problems associated
with the arithmetic average, which is an approximation of an expected value, extend
to other expected value based methods, such as regressions, which describe the
average value of an outcome for a particular subset of predictors. As such, it is
important that researchers familiarize themselves with the limitations of such
popular methodologies.
In this paper, our principal aim is to provide intuition for and to familiarize
researchers with the peculiar behavior of data in high dimensions. In particular, we
discuss averages in the context of multi-dimensional data, and explain why it is that
being normal is so abnormal. We touch on some of the peculiarities of multi-
dimensional spaces, such as how a high-dimensional sphere has close to zero volume,
and how high-dimensional random variables cluster in a thin annulus a nite dis-
tance away from the mean. Whilst not the primary focus of this work, we also
consider the relevance of these phenomena to outlier detection, and suggest that
outliers should not only be considered to include datapoints which lie far from the
mean, but also those points close to the mean. The intuition associated with this
phenomenon helps to explain why heterogeneity between individuals represents
such a great challenge for researchers in the social sciences humans as high-
dimensional phenomena tend only to be normal insofar as they are all abnormal in
slightly dierent ways.
1One exception includes work by Kroc and Astivia (2021) for the determination of scale cutos.
88 M. J. Vowels
Using information theoretic concepts, we propose an alternative way of char-
acterizing normality and detecting outliers, namely through the concept of typi-
cality. We demonstrate the peculiarities as well as the proposed concepts both on
idealistic simulated data, as well as data from the Politics and ViewsLISS panel
survey (Scherpenzeel and Das 2010).
2
Finally, we compare the outlier detection
performance of typicality with the most common alternative (based on the Maha-
lanobis distance) and demonstrate it to be a viable alternative. More broadly, we
argue that if the average value in a multivariate setting is unlikely, then outlier
detection techniques should be able to identify it as such. This means updating our
working conceptualization of outliers to include not only points which lie far from
the mean (as most outlier detection methods do) but also those points which lie too
close to the mean, particularly as the dimensionality of the dataset increases.
1 Background
The notion of the mean of a Gaussian, or indeed its nite-sample estimate in the form
of the arithmetic average, as representing a normal personstill holds strong rele-
vance in society and research today. Quetelet did much to popularise the idea
(Caponi 2013; Quetelet 1835), having devised the much used but also much criticized
Body Mass Index for characterizing a persons weight in relation to their height. The
average is used as a way to parameterize, aggregate, and compare distributions, as
well as to establish bounds for purposes of dening outliers and pathology vis-à-vis
normalityin individuals. Quetelets perspective was also shared by Comte, who
considered normality to be synonymous with harmony and perfection (Misztal 2002).
Even though it is important to recognize the societal and ethical implications of such
views, this paper is concerned with the characteristics of multivariate distributions;
in particular, those characteristics which help us understand why averages might
provide a poor representation of normality, and what we might consider as an
alternative.
In the past, researchers and commentators (including well-known gures such
as Foucault) have levied a number of critiques at the use of averages in psychology
and social science (Foucault 1984; Myers 2013; Speelman and McGann 2013; Wetherall
1996). Part of the problem is the over-imposition of the Gaussian distribution on
empirical data. The Gaussian has only two parameters, and even if the full proba-
bility density function is given, only two pieces of information are required to specify
it the mean (which we treat as equivalent to the arithmetic average) and the
2An anonmyized github repository with the Python code used for all analyses, simulations, and plots
can be found at https://anonymous.4open.science/r/Typicality-005B.
Typical Yet Unlikely and Normally Abnormal 89
variance. Even in univariate cases, the mean can be reductionist, draining the data of
nuance and complexity. Many of the developments in statistical methodology have
sought to increase the expressivity of statistical models and analyses in order to
account for the inherent complexity and heterogeneity associated with psychological
phenomena. For example, the family of longitudinal daily diary methods (Bolger and
Laurenceau 2013), as well as hierarchical models (Raudenbush and Bryk 2002) can be
used to capture dierent levels of variability associated with the data generating
process. Alternatively, other methods have sought to leverage techniques from the
engineering sciences, such as spectral analysis, in order to model dynamic uctua-
tions and shared synchrony between partners over time (Gottman 1979; Vowels et al.
2018). Machine learning methods provide powerful, data-adaptive function
approximation methods for letting the data speak(van der Laan and Rose 2011) as
well as for testing the predictive validity of psychological theories (Vowels 2021;
Yarkoni and Westfall 2017), and in the world of big data, comprehensive meta-
analyses allow us to paint complete pictures of the gardens of forking paths (Gelman
and Loken 2013; Orben and Przybylski 2019).
Multi-dimensional data exhibit a number of peculiar attributes which concern
the use of averages. Assuming one conceives of a normal personas having qualities
similar to those of a typical person,wend that the arithmetic average diverges
from this conception rather quickly, as the number of dimensions increases. The
peculiar attributes start to become apparent in surprisingly low-dimensional con-
texts (as few as four variables), and become increasingly extreme as dimensionality
increases. Understanding these attributes is particularly important because the
dimensionality of datasets and analyses is increasing along with the popularity of
machine learning. For instance, a machine learning approach identifying important
predictors of relationship satisfaction incorporated upwards of 189 variables (Joel
et al. 2020), and similar research looking at sexual desire used around 100 (Joel et al.
2017; Vowels, Vowels, and Mark 2021). Assuming that high-dimensional datasets will
continue to be of interest to psychologists, researchers ought to be aware of some of
the less intuitive but notable characteristics of such data.
As we will discuss, one domain for which the mean can be especially problematic
in multiple-dimensional datasets is outlier detection. In general, outlier detection
methods concern themselves with the distance that points lie from the mean. Even
methods designed to explore distances from the median are motivated by consid-
erations/diculties with estimation, and are otherwise based on the assumption that
the expected value (or the estimate thereof) provides an object against which to
compare datapoints (Leys et al. 2019). Unfortunately, and as Danielsdiscovered for
the USAF, values close to the mean become increasingly unlikely as the number of
dimensions increases, making the mean an inappropriate reference for classifying
outliers. Indeed, in Danielsexperiment it would have been a mistake to accept
90 M. J. Vowels
anyone close to the average as anything other than an outlier. As we describe later,
one can successfully summarise a set of datapoints in multiple dimensions in terms
of their typicality. We later evaluate the performance of a well-known multivariate
outlier method (based on the Mahalanobis distance) in terms of its capacity to
identify values far from the empirical average as outliers, and compare it against our
proposed measure of typicality.
2 Divergence from the Mean
This section is concerned with demonstrating some of the un-intuitive aspects of data
in higher dimensions. We begin by showing that, as dimensionality increases, the
distancethat a datapoint is from the mean/average increases at a rate of
D
where
Dis the number of dimensions. We then provide a discussion of the ramications.
Finally. we briey present an alternative geometric view that leads us to the same
conclusions.
Notation: In terms of notation, we denote a datapoint for an individual ias x
i
where i= {1, 2, ,N}. The total number of individual datapoints is N, and the bold
font indicates that the datapoint is a vector (i.e. it is multivariate). A single dimension
dfrom individual is datapoint is given as x
i,d
, where d(Z+)D, where Dis the total
number of dimensions.
2.1 Gaussian Vectors in High Dimensions
Let us begin in familiar territory for a multivariate distribution with independently
and identically distributed (i.i.d) Gaussian variables, the probability density function
for each dimension may be expressed as:
p(xi,d)= 1

2πσ2
d
e(xi,dμd)2
2σ2
d(1)
where μ
d
and σ
d
represent the mean and standard deviation of dimension d,
respectively. Each multivariate datapoint x
i
may be considered as a vector in this D-
dimensional space. An example of two datapoints drawn from a two-dimensional/
bivariate version of this distribution (i.e. D= 2), is shown in Figure 1. In this gure, the
values of these two random samples are x
1
= (0.4, 0.8) and x
2
= (0.55, 0.35). Assuming
that these datapoints are drawn from a distribution with a mean of 0 and a variance
of 1 for all dimensions (i.e. Nμd=0,σ2
d=1

d), then we can compute the
Typical Yet Unlikely and Normally Abnormal 91
distance these datapoints fall from the mean μ=0using the squared Euclidean
distance (see Eq. (2)),
xi2
2=
D
d=1
x2
i,d(2)
Here, we use the subscript dto index the dimension of the multidimensional data-
point x
i
. Importantly, note that the squared Euclidean norm closely resembles the
expression for sample variance for a certain dimension d(Eq. (3)):
Var
——
(xd)=1
N
N
i=1
x2
i,d(3)
For the two example vectors in Figure 1, taking the square root of the values derived
using Eq. (2), the distances from the mean are x
1
2
= 0.8 and x
2
2
= 0.3.
In other words, the variance of a sample is closely related to the distance that
each sample is expected to fall from the mean. Note that, when computing the
variance for a particular variable or dimension, we sum across datapoints
i, rather than dimensions d. Secondly, and more importantly, the variance con-
tains a normalization term 1/N, whereas the expression for the norm does not. By
consequence of this absent normalization term, the expected squared distance
of each datapoint from the mean will grow with increasing dimensionality. In
this example, we know that the variance of our distribution σ2
d=1forboth
dimensions, and as such, it is trivial to show that each individual dimension dwill
have an average/expected length equal to one. Without the normalization term
(i.e. 1/D), this means that the expected squared length of the vectors grows in
proportion to the number of dimensions.
3
Alternatively, taking a square root, we
Figure 1: Two samples in two-
dimensional space, with their
corresponding coordinates.
3This is simply the result of summing more values together, without accounting for the number of
dimensions. This is intentional to compute the overall distance a sample lies from the mean, it is
92 M. J. Vowels
can say that the expected distance that samples fall from the mean increases
proportional to the square-root of the dimensionality of the distribution. More
concretely: Ex2
[]
D
(Vershynin 2019).
This can of course also be veried in simulation, and Figure 2 shows both the
analytical as well as sample estimates for the average length of the vectors as the
number of dimensions increases. The intervals are dened by the 1st and 99th
percentiles. Each approximation to the expectation is taken over a sample size of 200
datapoints. The dashed red curve depicts the
D
relationship, and the black simu-
lated curve is a direct (albeit noisier, owing to the fact that this curve is simulated)
overlay. This should start to remind us of Danielsexperience when working for the
USAF he found that out of 4063 people, not a single one of them fell within 30 % of
the mean over ten variables. Indeed, if any had done, we should consider labelling
them as outliers, in spite of the fact that most existing outlier detection methods are
only sensitive to points which lie far from the mean.
The implications of this are important to understand. Whilst we know that each
variable x
d
has an expected value of zero and a variance of one, the expected length
of a datapoint across all Ddimensions/variables grows in proportion to the square
root of the number of variables. Dieleman (2020) summarised this informally when
they observed that if we sample lots of vectors from a 100-dimensional standard
Gaussian, and measure their radii, we will nd that just over 84 % of them are
Figure 2: The red dashed curve is simple
D
, whilst the black curve is a simulated estimate of the
expected lengths, calculated over 200 datapoints, for increasing dimensionality D. The blue interval
represents the 199 % percentiles.
necessary to sum across all possible dimensions/directions without adjusting for the number of
dimensions.
Typical Yet Unlikely and Normally Abnormal 93
between 9 and 11, and more than 99 % are between 8 and 12. Only about 0.2 % have
a radius smaller than 8!In other words, the expected location of a datapoint in
D-dimensional space moves further and further away from the mean as the
dimensionality increases.
It can also be shown that such high-dimensional Gaussian random variables are
distributed uniformly on the (high-dimensional) sphere with a radius of
D
, and
grouped in a thin annulus (Stein 2020; Vershynin 2019).
4
The uniformity tells us the
direction of these vectors (i.e. their location on the surface of this high-dimensional
sphere) is arbitrary, and the squared distances, or radii, are Chi-squared distributed
(it is well known that the Chi-squared distribution is the distribution of the sum of
squares of Dindependent and identically distributed Gaussian variables). The dis-
tribution of distances (vis-à-vis the squared distances) is therefore Chi-distributed.
Figure 3 compares samples from a Chi-squared distribution against the distribution
of 10,000 squared vector lengths. Altogether, this means that in high-dimensions,
(a) it is unlikely to nd datapoints anywhere close to the average (even though the
region close to the mean represents the one with the highest likelihood, the proba-
bility is nonetheless negligible), (b) randomly sampled vectors are unlikely to be
correlated (of course, in expectation the correlation will be zero because the di-
mensions of the Gaussian from which they were sampled are independent), and
(c) randomly sampled vectors have lengths that are close to the expected length
which increases at a rate
D
. As such, the datapoints tend to cluster in a subspace
which lies at a xed radius from the mean (we will later refer to this subspace as the
typical set). This is summarized graphically in Figure 4.
It is important that researchers understand that while the mean of such a high-
dimensional Gaussian represents the value which minimizes the sums of squared
Figure 3: For D= 40 these
histograms show the distributions
of 10,000 datapoints sampled from
aχ2distribution (red) and the sums
of squared distances x2
2.
4See also Gaussian Annulus Theorem (Blum et al. 2020).
94 M. J. Vowels
distances (and is therefore the estimate which maximises the likelihood), the ma-
jority of the probability mass is actually not located around this point. As such, even
though a set of values close to the mean represents the most likely in terms of its
probability of occurrence, the magnitude of this probability is negligible, and most
points fall in a space around
D
away from the mean. Figure 5 depicts the lengths of
2000 vectors sampled from a 40-dimensional Gaussian they are nowhere close to
the origin. Another way to visualize this is to plot the locations of the expected lengths
for dierent dimensionalities on top of the curve for N(0,1), and this is shown in
Figure 6. In terms of the implications for psychological data datasets which involve
high numbers of variables are likely to comprise individuals who are similar only
insofar as they appear to be equally abnormal, at least insofar as a univariate
characterization of normality (e.g. the mean across the dimensions) is a poor one
when used across multiple dimensions. Indeed, if an individual does possess char-
acteristics close to the mean or the mode across multiple dimensions, they could
reasonably be considered to be outliers. We will consider outlier detection more
closely in a later section.
2.2 An Alternative Perspective
Finally, the peculiarities of high-dimensional space are well visualized geometrically.
Figure 7 shows the generalization of a circle in 2-Dinscribed within a square, to a
sphere in 3-D inscribed within a cube. Taking this further still, the cube and the
sphere can be generalized to hyper-cubes and hyper-spheres, the volumes for which
can be calculated as C
D
=lDand SD=πD/2
ΓD
2+1
()
, respectively. The latter is a generalization
of the well-known expression for the volume of a sphere. The link with the previous
discussion lies in the fact that Gaussian data are spherically (or at least elliptically)
distributed. As such, an exploration of the characteristics of spheres and ellipses in
high-dimensions tells us something about high-dimensional data.
Figure 4: The plot illustrates how, in high-
dimensions, the probability mass is located in a
thin annulus at a distance σ
D
from the average
(in the text, we assume σ= 1), despite the mean
representing the location which maximizes the
probability density. Adapted from (MacKay 1992).
Typical Yet Unlikely and Normally Abnormal 95
Figure 7 illustrates how the ratio between the volume of a sphere and the
volume of a cube changes dramatically even though the number of dimensions has
only increased by 1. In the rst case, considering the square and the circle, the ratio
between the volumes is 4/π1.27. In the second case, considering now the cube and
the sphere, the ratio between the volumes is 8/4.19 = 1.90. In other words, the
volume of the sphere represents a signicantly smaller fraction of the total volume
of the cube even though only one extra dimension has been added. Figure 8 il-
lustrates how this pattern continues exponentially, such that the ratio between a
cube and sphere for D= 20 is over 40 Million. In other words, a cube with sides of
length two has a volume which is 40 million times greater than the sphere inscribed
within it. This eect is at least partly explained by the extraordinary way the
Figure 6: This plot shows the location of the expected lengths of vectors of dierent dimensionality in
relation to the standard normal in one dimension. It can be seen that even at D= 5, the expected length
is over two standard deviations from the mean.
Figure 5: A scatter plot showing
the lengths of 2000 vectors sampled
from a 40-dimensional Gaussian.
Red line shows the average vector
length, and the green intervals
depict the size of the typical set for
dierent values of ϵ. Note that the
mean (0,0) is nowhere near the
distribution of norms or the typical
set.
96 M. J. Vowels
volume of a sphere changes as dimensionality increases. Figure 9 shows how the
volume of a sphere with a radius of one in D-dimensions quickly tends to zero after
passing a maximum at around D= 5. In other words, a high-dimensional sphere has
negligible volume.
Whilst the implications of this are the same as those in the previous section, the
demonstration hopefully gives some further intuition about just how quickly the
strange eects start to occur as Dis increased, at least in the case where our di-
mensions are independent. In order to gain an intuition for whether these eects
translate to more realistic data (including correlated dimensions/variables), see the
analysis below. Many problems in social science involve more than just a few di-
mensions, and problems which utilise big dataare even more susceptible to issues
relating to what is known as the curse of dimensionality.
3 Typicality: An Information Theoretic Way to
Characterize Normality
In the previous section, we described how randomly sampled vectors in high-
dimensional space tend to be located at a radius of length
D
away from the mean,
and tend to be uncorrelated. This makes points close to the mean across multiple
dimensions poor examples of normality. In this section we introduce the concept of
typicality from information theory, as a means to categorize whether a particular
sample or a particular set of samples is/are normalor abnormal(and therefore also
whether the points should be considered to be outliers).
Figure 7: Depicts a circle inscribed within a square (left) and a sphere inscribed within a cube (right).
Even though the diameters of the circle and the sphere are the same as the lengths of the sides of the
square and the cube, the ratio between the volume of the circle and the volume of the square greatly
decreases when moving from two to three dimensions.
Typical Yet Unlikely and Normally Abnormal 97
3.1 Entropy
Entropy describes the degree of surprise, uncertainty, or information associated
with a distribution and is computed as N
i=1p(xi)log p(xi)where Nis the number of
datapoints in the sample distribution, x
i
is a single datapoint in this distribution, and
Figure 8: Depicts the ratio between the volume of a hupercube and the volume of a hypersphere as the
dimensionality Dincreases. Note the scale of the y-axis (×107).
Figure 9: Shows how the volume S
D
of a hypersphere changes with dimensionality D.
98 M. J. Vowels
p(x
i
) is that datapoints corresponding probability.
5
If the entropy is low, it means the
distribution is more certain and therefore also easier to predict.
Takingafaircoinasanexample,p(x=heads)=p(x= tails) = 0.5. The entropy of
this distribution is H=(0.5 log
2
0.5 +0.5 log
2
0.5) = 1. Recall from above that entropy
describes the amount of information content the units of entropy here are in bits.
The fact that our fair coin has 1 bit of information should therefore seem quite
reasonable there are two equally possible outcomes and therefore one bitsworth
of information. Furthermore, because the coin is unbiased, we are unable to pre-
dict the outcome any better than by randomly guessing. On the other hand, letssay
we have a highly biased coin whereby p(x=heads)=0.99andp(x=tails)=0.01.In
this case H=(0.99 log
2
0.99 +0.01 log
2
0.01) = 0.08. The second example had a much
lower entropy because we are likely to observe heads, and this makes samples from
the distribution more predictable. As such, there is less new or surprising infor-
mation associated with samples from this distribution, than there was for the case
where there was an equal chance of a head or a tail.
According to the Asymptotic Equipartition Property, entropy can be approxi-
mated as a sum of log probabilities of a sequence of random samples, in a manner
equivalent to how an expected value can be estimated as a sum of random samples
according to the Law of Large Numbers (Cover and Thomas 2006):
1
Nlog p(xi=1,x2,,xN)1
N
N
i=1
log p(xi)H(x)for sufficiently large N(4)
In words, the negative log of the joint probability tends towards the entropy of the
distribution. Entropy therefore gives us an alternative way to characterize
normality; but now instead of doing so using the arithmetic mean, we do so in terms
of entropy. Rather than comparing the value of a new sample against the mean or
expected value of a distribution, we can now consider the probability of observing
that sample and its relation to the entropy of the distribution.
3.2 Dening the Typical Set
We are now ready to dene the typical set. Rather than comparing datapoints against
the mean, we can compare them against the entropy of the distribution H. For a
chosen threshold ϵ, datapoints may be considered typical according to (Cover and
Thomas 2006; Dieleman 2020; MacKay 2018):
5We temporarily consider the discrete random variable case for this example, but note that the
intuition holds for continuous distributions as well.
Typical Yet Unlikely and Normally Abnormal 99
T={x:2−(H+ϵ)p(x)2−(Hϵ)}(5)
In words, the typical set Tcomprises datapoints xwhich fall within the bounds
dened on either side of the entropy of the distribution. Datapoints which have a
probability close (where close is dened according to the magnitude of ϵ) to the
entropy of the distribution are thereby dened as typical. Recall the thin annulus
containing most of the probability mass, illustrated in Figure 4; this annulus com-
prises the typical set. Note that, because this annulus contains most of our probability
mass, the set quickly incorporates all datapoints as ϵis increased (Cover and Thomas
2006). Note that this typical set (at least for modestvalues of ϵ) does not contain the
mean because, as an annulus, it cannot contain it by design (the mean falls at the
centre of a circle whos radius denes the radius of the annulus). The quantity given
in Eq. (5) can be computed for continuous Gaussian (rather than discrete) data using
the analytical forms for entropy Hfor the univariate and multivariate Gaussian
provided as Supplementary, and the probability density function for a univariate or
multivariate Gaussian for p(x). This is undertaken for the outlier detection simula-
tion below.
3.3 Establishing Typicality in Practice
Even though it is arguable as to whether the Gaussian should be used less ubiqui-
tously for modeling data distributions than it currently is (Micceri 1989), one of the
strong advantages of the Gaussian is its mathematical tractability. This tractability
enables us to calculate (as opposed to estimate) quantities exactly, simply by
substituting parameter values into the equations (assuming these parameters have
themselves not been estimated). Thus, moving from a comparison of dataset values
against the average or expected value to a consideration for typicality does not
necessitate the abandonment of convenient analytic solutions. A derivation of the
(dierential) entropy for a Gaussian distribution has been provided in Appendix A,
and is given in Eq. (6).
H(f)=1
2log2(2πeσ2)(6)
Note that the mean does not feature in Eq. (6) this make it clear that the uncertainty
or information content of a distribution is independent of its location (i.e. the mean)
in vector space.
6
As well as being useful in categorising datapoints as typical or
atypical (or, alternatively, inliers and outliers) in practice, Eq. (6) can also be used to
6Note that entropy is closely related to the score function (the derivative of the log likelihood) as well
as Fisher information, which is the variance of the score.
100 M. J. Vowels
understand the relationship between ϵand the fraction of the total probability mass
that falls inside the typical set. Returning to Figure 5 which shows the lengths of 2000
vectors sampled from a 40-dimensional Gaussian, we can see that as ϵincreases, we
gradually expand the interval to cover a greater and greater proportion of the
empirical distribution. Note also that the mean, which in this plot has a location (0,0),
is a long way from any of the points and is not part of (and, by denition, cannot be
part of) the typical set.
4 An Example with Real-World Data
To demonstrate that these eects do not only apply to idealistic simulations, we use
the LISS longitudinal panel data, which is open access (Scherpenzeel and Das 2010).
Specically, we use Likert-style response data from wave 1 of the Politics and Values
survey, collected between 2007 and 2008, which includes questions relating to levels
of satisfaction and condence in science, healthcare, the economy, democracy, etc.
Given that no inference was required for these data, a simple approach was taken to
clean it: all non-Likert style data were removed, leaving 58 variables, and text based
responses which represented the extremes of the scale we replaced with integers
(e.g. no condence at allis replaced with a 0). For the sake of demonstration, all
missing values were mean-imputed
7
(this may not be a wise choice in practice), and
the data were standardized so that all variables were mean zero with a standard
deviation of one. In total there were 6811 respondents.
Figure 10 depicts the bivariate correlations for each pair of variables in the data.
It can be seen that there exist many non-zero correlations, which makes these data
useful in understanding the generality of our expositions above (which were
undertaken with independent and therefore uncorrelated variables). Qualitatively,
some variables were also highly non-Gaussian, which again helps us understand the
generality of the eects in multi-dimensional data. Figure 11 shows how the expected
lengths of the vectors in the LISS panel data change as an increasing number of
dimensions are used. To generate this plot, we randomly selected Dvariables 1000
times, where Drange from three up to the total number of variables (58). For each of
the 1000 repetitions, we computed the Euclidean distances of each vector in the
dataset across these Dvariables, and then computed their average. Once the 1000
repetitions were complete, we compute the average across these repetitions to obtain
an approximation to the expectation of vector lengths in Ddimensions. Finally, we
7Across all included variables the amount of mean-imputation, on average, was 7.9 %. Note that
such imputation makes the demonstration more conservative, because it forces values to be equal to
the mean for the respective dimension.
Typical Yet Unlikely and Normally Abnormal 101
overlaid a plot of
D
to ascertain how close the empirically estimated vector lengths
are, compared with the expected lengths for a multivariate Gaussian. We also plot
the 199 % intervals, which are found to be quite wide, owing to the mix of lowly and
highly correlated variables in conjunction with possibly non-Gaussianity.
These results demonstrate that even for correlated, potentially non-Gaussian,
real-world data, the peculiar behavior of multi-dimensional data discussed in this
paper nonetheless occur. For the LISS data, the expected lengths were slightly lower
than for samples from a cleanmultivariate Gaussian, and this is likely to be due to
Figure 10: Depicts the bivariate
correlations for the LISS panel data
(Scherpenzeel and Das 2010).
Figure 11: The lengths for vectors from the LISS panel data (red), for increasing D, as well as the
expected lengths for a multi-variate Gaussian (blue). The LISS panel data curve includes 199 %
percentile intervals (Scherpenzeel and Das 2010).
102 M. J. Vowels
the correlations present in the data.
8
Indeed, when estimating the entropy of these
data (and as can be seen in the supplementary code), a robust covariance matrix
estimation approach was used to account for the non-isotropic nature of the joint
distribution. More generally, non-parametric methods can be used for the estimation
of typicality, but such methods are likely to have lower sample eciency (i.e. more
data are required for accurate estimation of entropy and typicality).
5 Moving Forward with Multivariate Outlier
Detection
Grubbs dened outliers as samples which deviate markedly from other members of
the sample in which it occurs(Grubbs 1969). This denition is useful to us here,
because it is not expressed in terms of distance from the mean, but in broad/general
terms. Indeed, as we have already discussed, in as few as four dimensions, points
near the mean become increasingly unlikely. This suggests that outlier methods
should not only identify points which are too far from the mean, but also those which
are too close.
Two related denitions of outliers which were noted by Leys et al. (2019) are:
Data values that are unusually large or small compared to the other values of the
same construct, and Data points with large residual values.The rst is quite
similar to Grubbsdenition, identifying values as unusually large or small
(i.e. deviating markedly) with respect to other values of the same construct (i.e. with
respect to the other members of the sample in which they occur). The second denes
them with respect to the residuals of a statistical model. In other words, they are
values which lead to large discrepancies between true and predicted values. Note
that both of these denitions bear the consequences for our work whether we are
comparing datapoints against the rest of the sample, or comparing them against the
predictions from a statistical model designed to estimate an expected value (which is
by far the most common case in psychology and social science), the relevance of these
denitions to our discussion remains the same.
It is also, perhaps, of interest to note that our denition of outliers makes no
value judgement about whether outliers are good or bad. Indeed, depending on the
application and our research questions, outliers may represent goldensamples.
Consider a manufacturer interested in fabricating the perfect mechanical prototype.
Each sample may have its own unique blemishes, and our target may represent the
perfect average across all (high-dimensional) opportunities for such blemishes. In
8For further discussion relating to this point, see Kroc and Astivia (2021).
Typical Yet Unlikely and Normally Abnormal 103
such a case, the average represents the golden target for our manufacturer, and
identifying it necessitates outlier detection methods which understand that values
across high-dimensions close to the mean should be considered to be (in this case,
desirable) outliers, in much the same way as samples which deviate because they are
too far from the mean may also be outliers for opposite reasons.
Leys et al. (2019) provide a useful summary of options for both univariate and
multivariate outlier detection, as well as a discussion about the consequences of
outlier management decisions. Whilst their work provide an excellent introduction
to multivariate outlier detection and good practice, they do not discuss the strange
behavior of the mean in multiple dimensions, nor the impact of this behavior on
multivariate outlier detection methods which are unable to detect outliers which lie
close to the mean.
We note, as other researchers have (Leys et al. 2019), that the most common
method used for multidimensional/multivariate outlier detection in the domain of
social science is the Mahalanobis distance (Mahalanobis 1930). For a description of
the Mahalanobis distance and its application, readers are directed to work by Li et al.
(2019) and Leys et al. (2018). Briey, the method assesses the distance of a point from
the centroid (i.e. the mean) of a cloud of points in (possibly correlated) multidi-
mensional space. The researchers note that in order to compute the distance from
putative outliers to the mean, it is rst necessary to estimate the mean and covari-
ance whilst including those points in the estimation (Leys et al. 2013; Leys et al. 2018).
This process is somewhat problematic because if outliers are included in the
calculation being used to compute the mean and covariance, the estimation of these
quantities will themselves be biased towards these outliers, thereby reducing the
chances of correctly identifying the outliers. A solution is proposed which is called
the robustMahalanobis distance (Leys et al. 2018; Li et al. 2019), which leverages
what is known as the Minimum Covariance Determinant (MCD), and estimates the
centroid/mean by selecting an estimate of the mean from a set of estimates derived
from dierent subsets of the dataset.
Unfortunately, despite the Mahalanobis distance and its robust variant being the
most commonly used multidimensional/multivariate outlier detection techniques in
social science, it suers from the same problems as any multidimensional method
based on distances from the centroid/mean. By consequence it would certainly not
ag someone average in all dimensions as an outlier, even though statistically they
would represent an extremely unusual individual (it would not help Daniels with his
project, for example). It is therefore important that researchers qualify their de-
nition of the outlying set to explicitly admit points which may fall too close to the
mean.
When using the Mahalanobis distance, one can make decisions about the set of
outliers Ousing the following expression:
104 M. J. Vowels
O={x:M(x)>c},(7)
where M(x)= 
(xμ
)TS1(xμ
)
and is the estimated Mahalanobis distance (in
units of standard deviation) for the multivariate datapoint under consideration x,
and cis the threshold for classifying a point as an outlier. In the expression for M,μ
is
the estimate of the mean of the distribution, Sis the estimated covariance matrix.
One of the benets of the Mahalanobis based methods is that one can use them to
threshold the data based on units of standard deviations. Thinking in terms of
standard deviations is not unusual and therefore the process of selecting outliers in
these terms thus leads to intuitive selection thresholds. In contrast, we see in Eq. (5)
that the threshold for determining whether a datapoint falls within the typical set T
depends on ϵ, which is not related to the standard deviation, but rather to a distance
away from the entropy.
We have already seen how typicality has the added advantage of classifying
datapoints which lie too close to the mean. In Figure 12 we show that, in low-
dimensional settings, typicality can be used to make approximately the same clas-
sication of outliers as the Mahalanobis distance to the extent that some datapoints
which lie far from the mean should still be classied as outliers. Of course, in practice
a balance must be struck between the value of ϵin Eq. (5), in the same way that cin
Eq. (7) must be decided.
Specically, for Figure 12, we generated 125 points from a bivariate Gaussian with
a covariance of 0.5, and thenadded a set of equally spaced outlier points ranging from
negative four to positive four on the y-axis (indicated with horizontal dashes). As such,
not all these points are expected to be identied as outliers, because some of their
values lie well within the tails of the distribution. They do, however, enable us to
compare at which point they are identied as outliers by the two detection methods
under comparison. Note that the subsequent estimation is done after the creation of
the complete dataset (including the outliers) using all the empirical values. Using the
robust MCD estimator mentioned above, we computed both the Mahalanobis distance
(in units of standard deviation), and colored each point according to this distance. For
typicality, we followed the estimation of entropy for the multivariate Gaussian which
also takes in an estimate for the covariance (see the Appendix A for the relationship
between the covariance matrix and the entropy of a Gaussian), for which we again
used the MCD method. The use of MCD for typicality arguably makes our typicality
estimator robustfor the same reason that it is considered to make the Mahalanobis
distance estimation robust. The threshold for the Mahalanobis distance was set to
three standard deviations, whilst the value for the typicality threshold was set to ve.
In practice, researchers may, of course, need to suitably selectand justify these values.
The scatter-plot marker shapes are set according to whether the outliers were
classied as such by both methods (circles), just the Mahalanobis method (squares),
Typical Yet Unlikely and Normally Abnormal 105
or just the typicality method (triangles). If neither method classies a point as an
outlier, the points are set to vertical dashes (i.e. inliers). Note that there are no points
which are classied as outliers by the Mahalanobis method which are not also
classied as outliers by the typicality method. The inverse is not quite true, with one
additional point (indicated with the triangle marker) being classied as an outlier by
the typicality method.
9
Figure 12 therefore indicates that, in low-dimensions,
Mahalanobis distance performs similarly to typicality as an outlier detection method.
Figure 12: Comparison of Mahalanobis distance and typicality for outlier detection. The outliers are
generated as a vertical set of equally spaced points (indicated with horizontal dashes) ranging from
negative four to positive four on the y-axis, superimposed on a set of 125 points (indicated with vertical
dashes) drawn from a bivariate Gaussian with a covariance of 0.5. The points identied to be outliers by
both methods are indicated in circles, whilst those indicated to be outliers by the Mahalanobis or
typicality methods separately are indicated by squares or triangles, respectively. The color of the points
represents the Mahalanobis distance in units of standard deviation. The estimation of the covariance
matrices for both methods used the robust Minimum Covariance Determinant (MCD) method. Note that
there are no squares because no points were uniquely detected as outliers according to Mahalanobis
distance.
9Although this classication is correct, this point lies on the limit of the cloud of true inliers, and so
in practice it would not be clear whether this would represent a useful outlier classication or not.
106 M. J. Vowels
In Figure 13, we undertake the same task, but this time in a 20-dimensional space.
The gure shows the lengths of each of 1400 points (the lengths are used for visu-
alisation purposes) drawn from a 20-dimensional, isotropic Gaussian. Fifteen of
these points are manually set to fall very close to the mean/expected value of zero,
and these are the simulated outliers we wish to identify which fall towards the
bottom of the plot. Now, in contrast to the example above, we see a large dierence
between the outliers identied using the two methods. Typicality successfully
identies all 15 true outliers as outliers, whereas MCD fails to identify any of them.
Conversely, some points which lie far from the mean (but which have a low prob-
ability of occurrence relative to the entropy of the distribution) are identied by both
MCD and typicality, although it is possible that by tweaking the thresholds one could
achieve greater overlap between the classication of these points by the two
methods.
In summary, typicality does not only have a role in detecting outliers in high-
dimensional scenarios (where the outliers may include values close to the expected
value), but can perform similarly to how current approaches (such as MCD) do in
low-dimensional scenarios, which otherwise fail in high-dimensions. We thus
recommend practitioners consider typicality as a viable outlier detection approach
under both low- and high-dimensional conditions, and especially in high-
dimensions. To this extent, researchers are encouraged to consult various com-
mentaries on the usage of outlier detection methods, such as the one by Leys et al.
(2019) which provides general recommendations for practice (including pre-
registration). It is notable that prior commentary does not include a discussion
about the limitations of Mahalanobis based methods for outlier detection once the
number of dimensions increases, which serves as a reminder of how important it is
that researchers explore typicality. We recommend updating the working con-
ceptualisation of outliers to include those points which, in high-dimensions (but as
few as 410 dimensions) fall too close to the mean.
Finally, we note a quite dierent approach to identifying outliers known as
cellwise-outlier detection (Raymaekers and Rousseeuw 2021) which can be used to
identify outliers at a more granular level (identifying not only which cases are
outliers, but which variables are responsible for this classication). Note, however,
that this approach does not include a discussion about the additional complications
that arise as dimensionality increases (specically, a discussion about how the
relevance of identifying unusually high or low values for certain variables might
change if one considers that values close to the average are atypical). Further work is
required to more broadly evaluate the implications of the atypicality of the mean
across other statistical approaches.
Typical Yet Unlikely and Normally Abnormal 107
6 Conclusions
The principal goal of this work was to provide researchers with an intuition about
the behavior of data as the number of dimensions increases. Through our explora-
tion of multi-dimensional space, we have shown that the mean, far from repre-
senting normality, actually represents abnormality, in so far as encountering a
datapoint close to the mean in datasets comprising more than a handful of di-
mensions becomes incredibly unlikely, even with a large number of datapoints. In
contrast with the arithmetic average, the information theoretic quantity known as
typicalityprovides a way to establish normality (or rather, whether a datapoint is
typical or atypical), which is particularly useful in high-dimensional regimes. Given
that researchers in psychology and social science frequently deal with multivariate
Figure 13: Comparison of Mahalanobis distance and typicality for outlier detection in 20 dimensional
space. The outliers are generated as a set of 15 points close to the expected value of 0, superimposed on
a set of 1400 points drawn from a 20 dimensional isotropic Gaussian. For visualisation purposes, this plot
shows the lengths of each point (the x-axis is simply the index of the point in the dataset). The points
identied to be outliers by both methods are indicated in circles, whilst those indicated to be outliers by
the Mahalanobis or typicality methods separately are indicated by squares or triangles, respectively. The
color of the points represents the Mahalanobis distance in units of standard deviation. The estimation of
the covariance matrices for both methods used the robust Minimum Covariance Determinant (MCD)
method. Note that there are no squares because no points were uniquely detected as outliers according
to Mahalanobis distance.
108 M. J. Vowels
datasets, and that the peculiarities associated with multi-dimensional spaces start
occurring in relatively low dimensions (as few as four), it is important that re-
searchers have some awareness of the concepts presented in this paper. Indeed, the
implications of this work are particularly important to consider for researchers
concerned with policy. Policies, especially in areas like social welfare, health, and
education, are often built around statistics related to averageindividuals. If the
representational relevance of the mean is limited, particularly in multi-dimensional
contexts, then policy design, implementation, and evaluation decisions based on this
perspective may be misdirected.
Clearly, the motivations behind the characterizations of points as either normal
or abnormal overlap strongly with those behind outlier detection. The discussion
also provides us with a good justication for updating our working denition of
outlierto include points which lie unusually close to the mean. Unlike popular
multivariate outlier detection techniques such as the Mahalanobis distance, which
characterize outliers as points which lie far from the expected value of the distri-
bution, typicality additionally oers a means to detect those which are close. Whilst
such additional benets of typicality based methods become more evident as the
dimensionality of the dataset increases (where traditional methods like Mahalanobis
distance fail) we showed that typicality also performs as one would hope/expect in
low dimensions. To show this, we nished with an evaluation of typicality for
bivariate outlier detection using a robustversion of entropy using the Minimum
Covariance Determinant estimation technique, and veried via simulation that in
low-dimensions it works well as an alternative to the popular Mahalanobis distance.
In addition, it is worth noting that we used a closed-form, parametric expression
for entropy and for the probability of individual datapoints p(x) (parametric in the
sense that we assumed an underlying multivariate Gaussian with a covariance and
mean). Such a parametric approach carries the advantage of high sample eciency,
that is, relatively few datapoints are required to adequately estimate the relevant
quantities. However, in practice one may (a) have more dimensions than datapoints,
(b) have non-Gaussian data, or (c) both. When the number of datapoints is low
compared with the number of dimensions, parametric estimators may exhibit sub-
stantial instability (especially the estimator for the covariance). In the case where the
data cannot be justiably parameterised by a Gaussian (or, indeed, any other
parametric distribution), semi- or non-parametric approaches can be used to esti-
mate the entropy and p(x). Unfortunately, there are concomitant disadvantages
regarding sample eciency. In other words, the number of samples required for
reliable estimation goes up considerably, and this requirement is disproportionately
problematic for data with high-dimensionality. The extent to which this is a problem
depends on the combination of the specic choice of estimators, the sample size, and
the dimensionality of the data. A detailed exploration of the interplay between these
Typical Yet Unlikely and Normally Abnormal 109
factors is beyond the scope of this work, and it should be mentioned that the point is
important regardless of whether one is intending to detect outliers, or undertaking
statistical modeling of high-dimensional data in general.
Appendix A: Dierential Entropy of a Gaussian
Following Cover and Thomas (2006), the dierential entropy (in bits) is dened as:
H(f)=−E[log(f(x))] =
+
f(x)logef(x)dx(8)
The probability density function of the normal distribution is:
f(x)= 1

2πσ2
e(xμ)2
2σ2(9)
Substituting the expression for f(x) into h(f):
H(f)=−
+
f(x)loge
1

2πσ2
e(xμ)2
2σ2
 (10)
H(f)=−
+
f(x)log2eloge
1

2πσ2

+logee(xμ)2
2σ2

(11)
H(f)=−
+
f(x)log2eloge(
2πσ2)
(xμ)2
2σ2)
(12)
H(f)=log2eloge
2πσ2
+
f(x)dx+log2e
+
(xμ)2
2σ2f(x)dx(13)
Note that:
+
f(x)dx=1 (14)
and recall that:
+
(xμ)2f(x)dx=E[(xμ)2]=Var(x)=σ2(15)
Therefore:
H(f)=log2
2πσ2
+log2e
2σ2σ2(16)
110 M. J. Vowels
And nally:
H(f)=1
2log2(2πeσ2)(17)
For a D-dimensional Gaussian, the derivation for the entropy is as follows:
H(f)=−E[log(f(x))] =
+
f(x)logef(x)dx,(18)
where the bold font indicates multidimensionality.
H(f)=−E[log[(2π)D/2|S|0.5exp(−0.5(xμ)TS1(xμ))]],(19)
where Sis the covariance matrix, Tindicates the transpose, and |.|indicates the
determinant.
H(f)=0.5Dlog(2π)+0.5 log|S|+0.5E[(xμ)TS1(xμ)] (20)
=0.5D(1+log(2π)) + 0.5 log(S)(21)
This last expression can then be expressed in bits by multiplying by log
2
e. Note
that this derivation follows the approach provided by Gundersen (2020) and uses a
number of tricksrelating to the trace operator.
References
Blum, A., J. Hopcroft, and R. Kannan. 2020. Foundations of Data Science. Cambridge: Cambridge University
Press.
Bolger, N., and J. P. Laurenceau. 2013. Intensive Longitudinal Methods. New York: The Guilford Press.
Caponi, S. 2013. Quetelet, the Average Man and Medical Knowledge.Hist Cienc Saude Manguinhos Hist
Cienc Saude Manguinhos 20 (3): 83047.
Comte, A. 1976. The Foundation of Sociology, edited by K. Thompson. London: Nelson.
Cover, T. M., and J. A. Thomas. 2006. Elements of Information Theory. New York: John Wiley and Sons Inc.
Daniels, G. 1952. The Average Man?In Technical Note 53-7 Wright Air Development Center. USAF.
Dieleman, S. 2020. Musings on Typicality. Also available at: https://benanne.github.io/2020/09/01/
typicality.html.
Foucault, M. 1984. Madness and Civilization, edited by P. Rabinow. London: Penguin Books.
Gelman, A., and E. Loken. 2013. The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem
Even when There Is No shing Expeditionor p-Hackingand the Research Hypothesis Was Posited
Ahead of Time. Also available at: http://www.stat.columbia.edu/gelman/research/unpublished/p\_
hacking.pdf.
Gottman, J. M. 1979. Detecting Cyclicity in Social Interaction.Psychological Bulletin 86 (2): 33848.
Grubbs, F. 1969. Procedures for Detecting Outlying Observations in Samples.Technometrics 11 (1): 121.
Gundersen, G. 2020. Entropy of the Gaussian. https://gregorygundersen.com/blog/2020/09/01/gaussian-
entropy/ (accessed May 03, 2022).
Typical Yet Unlikely and Normally Abnormal 111
Hacking, I. 1990. The Taming of Chance. Cambridge: Cambridge University Press.
Joel, S., P. Eastwick, and E. Finkel. 2017. Is Romantic Desire Predictable? Machine Learning Applied to
Initial Romantic Attraction.Psychological Science 28 (10): 147889.
Joel, S., P. Eastwick, C. Allison, and X. E. A. Arriaga. 2020. Machine Learning Uncovers the Most Robust
Self-Report Predictors of Relationships Quality across 43 Longitudinal Couples Studies.PNAS 117
(32): 1906171.
Kroc, E., and O. Astivia. 2021. The Importance of Thinking Multivarietly when Selecting Subscale Cuto
Scores.Educational and Psychological Measurement 51738.
Leys, C., C. Ley, O. Klein, P. Bernard, and L. Licata 2013. Detecting Outliers: Do Not Use Standard Deviation
Around the Mean, Use Absolute Deviation Around the Mean.Journal of Experimental Social
Psychology 49 (4): 7646.
Leys, C., O. Klein, Y. Dominicy, and C. Ley. 2018. Detecting Multivariate Outliers: Use a Robust Variant of
the Mahalanobis Distance.Journal of Experimental Social Psychology 74: 1506.
Leys, C., M. Delacre, D. Lakens, and C. Ley. 2019. How to Classify, Detect, and Manage Univariate and
Multivariate Outliers, with Emphasis on Pre-registration.International Review of Social Psychology 32
(1): 110.
Li, X., S. Deng, L. Lifang, and Y. Jiang. 2019. Outlier Detection Based on Robust Mahalanobis Distance and
its Application.Open Journal of Statistics 9 (1): 1526.
MacKay, D. J. C. 1992. A Practical Bayesian Framework for Backpropagation Networks.Neural
Computation 4 (3): 44872.
MacKay, D. J. C. 2018. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge
University Press.
Mahalanobis, P. 1930. On Tests and Measures of Group Divergence.Journal and Proceedings of Asiatic
Society of Bengal 26: 54188.
Micceri, T. 1989. The Unicorn, the Normal Curve, and Other Improbable Creatures.Psychological Bulletin
105: 15666.
Misztal, B. 2002. Rethinking the Concept of Normality: The Criticism of Comtes Theory of Normal
Existence.Polish Sociological Review 138: 189202.
Modis, T. 2007. The Normal, the Natural, and the Harmonic.Technological Forecasting and Social Change
74: 391494.
Myers, S. 2013. Normality in Analytic Psychology.Behavioural Sciences (Basel, Switzerland) 3 (4): 64761.
Orben, A., and A. Przybylski. 2019. The Association between Adolescent Well-Being and Digital
Technology Use.Nature Human Behaviour 3: 17382.
Quetelet, A. 1835. Sur lhomme et le developpement de ses facultes. Paris: Fayard.
Raudenbush, S., and A. Bryk. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods.
Thousand Oaks, CA: SAGE Publications.
Raymaekers, J., and P. Rousseeuw. 2021. Handling Cellwise Outliers by Sparse Regression and Robust
Covariance.Journal of Data Science, Statistics, and Visualisation 1 (3): 130.
Scherpenzeel, A., and M. Das. 2010. Social and Behavioral Research and the Internet: Advances in Applied
Methods and Research Strategies. edited by P. E. M. Das, and L. Kaczmirek, 77104. Boca Raton: Taylor
& Francis.
Speelman, C., and M. McGann. 2013. How Mean Is the Mean?Frontiers in Psychology 4 (451): 112.
Stein, S. 2020. Concentration properties of High-Dimensional Normal Distributions. https://stefan-stein.
github.io/posts/2020-03-07-concentration-properties-of-high-dimensional-normal-distributions/
(accessed May 03, 2022).
van der Laan, M. J., and S. Rose. 2011. Targeted Learning Causal Inference for Observational and
Experimental Data. New York: Springer International.
112 M. J. Vowels
Vershynin, R. 2019. High-Dimensional Probability: An Introduction with Applications in Data Science.
Cambridge: Cambridge Series in Statistical and Probabilistics Mathematics.
Vowels, M. J. 2021. Misspecication and Unreliable Interpretations in Psychology and Social Science.
Psychological Methods 28 (3): 50726.
Vowels, M. J., K. Mark, L. M. Vowels, and N. Wood. 2018. Using Spectral and Cross-Spectral Analysis to
Identify Patterns and Synchrony in CouplesSexual Desire.PLoS One 13 (10): e0205330.
Vowels, L. M., M. J. Vowels, and K. Mark. 2021. Uncovering the Most Important Factors for Predicting
Sexual Desire Using Interpretable Machine Learning.The Journal of Sexual Medicine.
Wetherall, M. 1996. Identities, Groups and Social Issues. London: SAGE Publications.
Yarkoni, T., and J. Westfall. 2017. Choosing Prediction over Explanation in Psychology: Lessons from
Machine Learning.Perspectives on Psychological Science 110022, https://doi.org/10.1177/
1745691617693393.
Typical Yet Unlikely and Normally Abnormal 113
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellFlagger technique which combines lasso regression with a stepwise application of constructed cutoff values. The penalty term of the lasso has a physical interpretation as the total distance that suspicious cells need to move in order to bring their row into the fold. For estimating a cellwise robust covariance matrix we construct a detection-imputation method which alternates between flagging outlying cells and updating the covariance matrix as in the EM algorithm. The proposed methods are illustrated by simulations and on real data about volatile organic compounds in children.
Article
Full-text available
The replicability crisis has drawn attention to numerous weaknesses in psychology and social science research practice. In this work we focus on three issues that cannot be addressed with replication alone, and which deserve more attention: Functional misspecification, structural misspecification, and unreliable interpretation of results. We demonstrate a number of possible consequences via simulation, and provide recommendations for researchers to improve their research practice. Psychologists and social scientists should engage with these areas of analytical and statistical improvement, as they have the potential to seriously hinder scientific progress. Every research question and hypothesis may present its own unique challenges, and it is only through an awareness and understanding of varied statistical methods for predictive and causal modeling, that researchers will have the tools with which to appropriately address them. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Full-text available
Setting cutoff scores is one of the most common practices when using scales to aid in classification purposes. This process is usually done univariately where each optimal cutoff value is decided sequentially, subscale by subscale. While it is widely known that this process necessarily reduces the probability of “passing” such a test, what is not properly recognized is that such a test loses power to meaningfully discriminate between target groups with each new subscale that is introduced. We quantify and describe this property via an analytical exposition highlighting the counterintuitive geometry implied by marginal threshold-setting in multiple dimensions. Recommendations are presented that encourage applied researchers to think jointly, rather than marginally, when setting cutoff scores to ensure an informative test.
Article
Full-text available
Background: Low sexual desire is the most common sexual problem reported with 34% of women and 15% of men reporting lack of desire for at least 3 months in a 12-month period. Sexual desire has previously been associated with both relationship and individual well-being highlighting the importance of understanding factors that contribute to sexual desire as improving sexual desire difficulties can help improve an individual's overall quality of life. Aim: The purpose of the present study was to identify the most salient individual (eg, attachment style, attitudes toward sexuality, gender) and relational (eg, relationship satisfaction, sexual satisfaction, romantic love) predictors of dyadic and solitary sexual desire from a large number of predictor variables. Methods: Previous research has relied primarily on traditional statistical models which are limited in their ability to estimate a large number of predictors, non-linear associations, and complex interactions. We used a machine learning algorithm, random forest (a type of highly non-linear decision tree), to circumvent these issues to predict dyadic and solitary sexual desire from a large number of predictors across 2 online samples (N = 1,846; includes 754 individuals forming 377 couples). We also used a Shapley value technique to estimate the size and direction of the effect of each predictor variable on the model outcome. Outcomes: The outcomes included total, dyadic, and solitary sexual desire measured using the Sexual Desire Inventory. Results: The models predicted around 40% of variance in dyadic and solitary desire with women's desire being more predictable than men's overall. Several variables consistently predicted dyadic sexual desire such as sexual satisfaction and romantic love, and solitary desire such as masturbation and attitudes toward sexuality. These predictors were similar for both men and women and gender was not an important predictor of sexual desire. Clinical translation: The results highlight the importance of addressing overall relationship satisfaction when sexual desire difficulties are presented in couples therapy. It is also important to understand clients' attitudes toward sexuality. Strengths & limitations: The study improves on existing methodologies in the field and compares a large number of predictors of sexual desire. However, the data were cross-sectional and there may have been variables that are important for desire but were not present in the datasets. Conclusion: Higher sexual satisfaction and feelings of romantic love toward one's partner are important predictors of dyadic sexual desire whereas regular masturbation and more permissive attitudes toward sexuality predicted solitary sexual desire. Vowels LM, Vowels MJ, Mark KP. Uncovering the Most Important Factors for Predicting Sexual Desire Using Explainable Machine Learning. J Sex Med 2021;XX:XXX-XXX.
Preprint
Previous studies have found a number of individual, relational, and societal factors that are associated with sexual desire. However, no studies to date have examined which of these variables are the most predictive of sexual desire. We used a machine learning algorithm, random forest (a type of interpretable highly non-linear decision tree), to predict sexual desire from a large number of predictors across two samples (N = 1846; includes 754 individuals forming 377 couples). We also used a Shapley value technique to estimate the size and direction of the effect of each predictor variable on the model outcome. The model predicted around 40% of variance in dyadic and solitary desire with women’s desire being more predictable than men’s. Several variables consistently predicted sexual desire including individual, relational, and societal factors. The study provides the strongest evidence to date of the most important predictors for dyadic and solitary desire.
Article
Given the powerful implications of relationship quality for health and well-being, a central mission of relationship science is explaining why some romantic relationships thrive more than others. This large-scale project used machine learning (i.e., Random Forests) to 1) quantify the extent to which relationship quality is predictable and 2) identify which constructs reliably predict relationship quality. Across 43 dyadic longitudinal datasets from 29 laboratories, the top relationship-specific predictors of relationship quality were perceived-partner commitment, appreciation, sexual satisfaction, perceived-partner satisfaction, and conflict. The top individual-difference predictors were life satisfaction, negative affect, depression, attachment avoidance, and attachment anxiety. Overall, relationship-specific variables predicted up to 45% of variance at baseline, and up to 18% of variance at the end of each study. Individual differences also performed well (21% and 12%, respectively). Actor-reported variables (i.e., own relationship-specific and individual-difference variables) predicted two to four times more variance than partner-reported variables (i.e., the partner's ratings on those variables). Importantly, individual differences and partner reports had no predictive effects beyond actor-reported relationship-specific variables alone. These findings imply that the sum of all individual differences and partner experiences exert their influence on relationship quality via a person's own relationship-specific experiences, and effects due to moderation by individual differences and moderation by partner-reports may be quite small. Finally, relationship-quality change (i.e., increases or decreases in relationship quality over the course of a study) was largely unpredictable from any combination of self-report variables. This collective effort should guide future models of relationships.
Article
Given the powerful implications of relationship quality for health and well-being, a central mission of relationship science is explaining why some romantic relationships thrive more than others. This large-scale project used machine learning (i.e., Random Forests) to 1) quantify the extent to which relationship quality is predictable and 2) identify which constructs reliably predict relationship quality. Across 43 dyadic longitudinal datasets from 29 laboratories, the top relationship-specific predictors of relationship quality were perceived-partner commitment, appreciation, sexual satisfaction, perceived-partner satisfaction, and conflict. The top individual-difference predictors were life satisfaction, negative affect, depression, attachment avoidance, and attachment anxiety. Overall, relationship-specific variables predicted up to 45% of variance at baseline, and up to 18% of variance at the end of each study. Individual differences also performed well (21% and 12%, respectively). Actor-reported variables (i.e., own relationship-specific and individual-difference variables) predicted two to four times more variance than partner-reported variables (i.e., the partner's ratings on those variables). Importantly, individual differences and partner reports had no predictive effects beyond actor-reported relationship-specific variables alone. These findings imply that the sum of all individual differences and partner experiences exert their influence on relationship quality via a person's own relationship-specific experiences, and effects due to moderation by individual differences and moderation by partner-reports may be quite small. Finally, relationship-quality change (i.e., increases or decreases in relationship quality over the course of a study) was largely unpredictable from any combination of self-report variables. This collective effort should guide future models of relationships.