Content uploaded by Todd C Pataky

Author content

All content in this area was uploaded by Todd C Pataky on Oct 29, 2017

Content may be subject to copyright.

Vector ﬁeld statistical analysis of kinematic and force trajectories

Todd C. Pataky1, Mark A. Robinson2, and Jos Vanrenterghem2

1Department of Bioengineering, Shinshu University, Japan

2Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, UK

Abstract

When investigating the dynamics of three-dimensional multi-body biomechanical systems it is often diﬃ-

cult to derive spatiotemporally directed predictions regarding experimentally induced e↵ects. A paradigm

of ‘non-directed’ hypothesis testing has emerged in the literature as a result. Non-directed analyses typ-

ically consist of ad hoc scalar extraction, an approach which substantially simpliﬁes the original, highly

multivariate datasets (many time points, many vector components). This paper describes a commen-

surately multivariate method as an alternative to scalar extraction. The method, called ‘statistical

parametric mapping’ (SPM), uses random ﬁeld theory to objectively identify ﬁeld regions which co-vary

signiﬁcantly with the experimental design. We compared SPM to scalar extraction by re-analyzing three

publicly available datasets: 3D knee kinematics, a ten-muscle force system, and 3D ground reaction

forces. Scalar extraction was found to bias the analyses of all three datasets by failing to consider suf-

ﬁcient portions of the dataset, and/or by failing to consider covariance amongst vector components.

SPM overcame both problems by conducting hypothesis testing at the (massively multivariate) vector

trajectory level, with random ﬁeld corrections simultaneously accounting for temporal correlation and

vector covariance. While SPM has been widely demonstrated to be e↵ective for analyzing 3D scalar

ﬁelds, the current results are the ﬁrst to demonstrate its e↵ectiveness for 1D vector ﬁeld analysis. It was

concluded that SPM o↵ers a generalized, statistically comprehensive solution to scalar extraction’s over-

simpliﬁcation of vector trajectories, thereby making it useful for objectively guiding analyses of complex

biomechanical systems.

Keywords: biomechanics, random ﬁeld theory, Statistical Parametric Mapping, multivariate statistics

1

Glossary

Category Symbol Other Description

Counts

Index:

I i Vector components

J j Responses (i.e. experimental recordings)

K k Predictor variables

NExtracted scalars (e.g. maximum force)

Q q Field measurement nodes (e.g. 100 points in time)

Responses

Mean, variance:

yiy,s2Scalar response (with st.dev.)

yi(q)y(q), s2(q) Scalar ﬁeld response (with st.dev. ﬁeld)

y(q)y(q), W(q) Vector ﬁeld response (with covariance ﬁeld)

Test statistics

ﬁeld:

tSPM{t}⌘t(q)Student’s tstatistic

FSPM{F}⌘F(q) Variance ratio (e.g. from ANOVA)

T2SPM{T2}⌘T2(q)Hotelling’s T2statistic (vector equivalent of t)

RCanonical correlation coeﬃcient

Probability ↵Type I error rate

pProbability value

Acronyms

CCA Canonical correlation analysis

EMG Electromyography

GRF Ground reaction force

PFP Patellofemoral pain

2

1 Introduction

Measurements of motion and the forces underlying that motion are fundamental to biomechanical ex-

perimentation. These measurements are often manifested as one-dimensional (1D) scalar trajectories yi(q),

where irepresents a particular physical body, joint, axis or direction, and where qrepresents 1D time or

space. Experiments typically involve repeated measurements of yi(q) followed by registration (i.e. homolo-

gously optimal temporal or spatial normalization) to a domain of 0–100% (Sadeghi et al., 2003). This paper

pertains to analysis of registered data yi(q).

Given that many potential sources of bias exist in yi(q) analysis (Rayner, 1985; James and Bates, 1997;

Mullineaux et al., 2001; Knudson, 2009), a non-trivial challenge is to employ statistical methods that are

consistent with one’s null hypothesis. Consider ﬁrst ‘directed’ null hypotheses: those which claim response

equivalence in particular vector components i, and in particular points qor windows [q0,q1]:

Example ‘directed’ null hypothesis: Controls and Patients exhibit identical maximum knee ﬂexion

during walking between 20% and 30% stance.

To test this hypothesis only maximum knee ﬂexion should be assessed, and only in the speciﬁed time

window. Testing other time windows, joints, or joint axes in a post hoc sense would constitute bias. This is

because increasing the number of statistical tests increases our risk of incorrectly rejecting the null hypothesis

(see Supplementary Material – Appendix A). In other words, it is biased to expand the scope of one’s null

hypothesis after seeing the data. We refer to this type of bias as ‘post hoc regional focus bias’.

Next consider ‘non-directed’ null hypotheses: hypotheses which broadly claim kinematic or dynamic

response equivalence:

Example ‘non-directed’ null hypothesis: Controls and Patients exhibit identical hip and knee kine-

matics during stance phase.

To address this hypothesis both hip and knee joint rotations should be assessed, about all three orthogonal

spatial axes, and from 0% to 100% stance (i.e. the entire dataset yi(q)). It would be biased to assess only

maximum hip ﬂexion, for example, in a post hoc sense but for the opposite reason: it is biased to reduce the

scope of one’s null hypothesis after seeing the data.

Non-directed hypotheses expose a second potential source of bias: covariance among the Ivector compo-

nents. Scalar analyses ignore covariance and are therefore coordinate-system dependent (see Supplementary

3

Material – Appendix B). This is important because a particular coordinate system — even one deﬁned

anatomically and local to a moving segment — may not reﬂect underlying mechanical function (Kutch and

Valero-Cuevas, 2011). Joint rotations, for example, may not be independent because muscle lines of action

are generally not parallel to externally-deﬁned axes (Jensen and Davy, 1975). Joint moments may also not

be independent because endpoint force control, for example, requires coordinated joint moment covariance

(Wang et al., 2000). Under a non-directed hypothesis this covariance must be analyzed because separate

analysis of the Icomponents is equivalent to an assumption of independence, an assumption which may not

be justiﬁed (see Supplementary Material – Appendix B). We refer to this source of bias as ‘inter-component

covariance bias’.

Both post hoc regional focus bias and inter-component covariance bias have been acknowledged previously

(Rayner, 1985; James and Bates, 1997; Mullineaux et al., 2001; Knudson, 2009). However, to our knowledge

no study has proposed a comprehensive solution.

The purpose of this paper is to show that a method called Statistical Parametric Mapping (SPM) (Friston

et al., 2007) greatly mitigates both bias sources. The method begins by regarding the data yi(q) as a vector

ﬁeld y(q), a multi-component vector ywhose values change in time or space q(Fig.1). When regarding the

data in this manner, it is possible to use random ﬁeld theory (RFT) (Adler and Taylor, 2007) to calculate

the probability that observed vector ﬁeld changes resulted from chance vector ﬁeld ﬂuctuations.

We use SPM and RFT to conduct formalized hypothesis testing on three separate, publicly available

biomechanical vector ﬁeld datasets. We then contrast these results with the traditional scalar extraction

approach. Based on statistical disagreement between the two methods we infer that, by deﬁnition, at least

one of the methods is biased. We ﬁnally use mathematical arguments (Supplementary Material) and logical

interpretations of the original data to conclude that scalar extraction constitutes a biased approach to non-

directed hypothesis testing, and that SPM overcomes these biases.

2 Methods

2.1 Datasets

We reanalyzed three publicly available datasets (Table 1):

•Dataset A (Neptune et al., 1999) (http://isbweb.org/data/rrn/): stance-phase lower extremity dy-

namics in ten subjects performing ballistic side-shu✏e and v-cut tasks (Fig.2). Present focus was on

4

within-subject mean three dimensional knee rotations for the eight subjects whose data were labeled

unambiguously in the public dataset.

•Dataset B (Besier et al., 2009) (https://simtk.org/home/muscleforces): stance-phase knee-muscle

forces during walking and running in 16 Controls and 27 Patello-Femoral Pain (PFP) patients, as

estimated from EMG-driven forward-dynamics simulations. Present focus was on walking and abso-

lute forces (newtons) (Fig.3).

•Dataset C (Dorn et al., 2012) (https://simtk.org/home/runningspeeds): one subject’s full-body kine-

matics and ground reaction forces (GRF) during running at four di↵erent speeds: 3.56, 5.20, 7.00, and

9.49 ms1. Present focus was on three-dimensional left-foot GRF (Fig.4), for which a total of eight

responses were available. We linearly interpolated the GRF data across stance phase to Q=100 time

points.

These three datasets were chosen, ﬁrst, to represent a range of biomechanical data modalities: kinematics,

modeled (internal) muscle forces, and external forces. Second, they were chosen to demonstrate how vector

ﬁeld analysis applies to a range of statistical tests: (A) paired ttests, (B) two-sample ttests, and (C) linear

regression.

2.2 Traditional scalar extraction analysis

Two, ten, and three scalars were respectively extracted from the three datasets (Table 1). These particular

scalars were chosen either because they appeared to be most a↵ected by the experiment (Datasets A and

C), or because they were physiologically relevant (Dataset B: maximum force). As indicated above, Dataset

A’s task e↵ects were assessed using paired ttests, Dataset B’s group e↵ects were assessed using two-sample

(independent) ttests, and Dataset C’s speed e↵ects were assessed using linear regression.

Since we conducted one test for each scalar, we performed N=2, N=10 and N=3 tests on Datasets A, B

and C, respectively, where Nis the number of extracted scalars. To retain a family-wise Type I error rate

of ↵=0.05 we adopted ˇ

Sid´ak thresholds of p=0.0253, p=0.0051, and p=0.0170 respectively, where the ˇ

Sid´ak

threshold is:

pcritical =1(1 ↵)1/N (1)

These scalar analyses superﬁcially appear to be legitimate analysis options. However, through comparison

5

with the equivalent vector ﬁeld analyses (§2.3), we will show how and why scalar extraction is biased for

non-directed null hypothesis testing.

2.3 Statistical Parametric Mapping (SPM)

SPM analyses (Friston et al., 2007) were conducted using vector ﬁeld analogs to the aforementioned

univariate tests (§2.2). Before detailing SPM procedures, we note that they are conceptually identical to

univariate procedures: conducting a one-sample ttest on ten scalar values, for example, is nearly identical

to conducting a one-sample ttest on ten vector ﬁelds. The only di↵erences are that SPM: (i) considers

vector covariance when computing the test statistic, (ii) considers ﬁeld smoothness and size when computing

the critical test statistic threshold, and (iii) considers random ﬁeld behavior when computing pvalues (see

Appendix A and B – Supplementary Material).

Ultimately each SPM test results in a test statistic ﬁeld (e.g. the tstatistic as a function of time), and

RFT is used to assess the signiﬁcance of this statistical ﬁeld. §2.3.1–§2.3.3 below detail test statistic ﬁeld

computations for the current datasets, §2.3.4 describes RFT computations of critical test statistic values and

pvalues, and §2.3.5 suggests a post hoc procedure for scrutinizing vector ﬁeld test results.

2.3.1 Paired Hotelling’s T2test (Dataset A)

SPM’s vector ﬁeld analog to the paired ttest is the paired Hotelling’s T2test, which is given by the

one-sample T2statistic (Cao and Worsley, 1999):

SPM{T2}⌘T2(q)=Jy(q)>W(q)1y(q) (2)

where Jis the number of vector ﬁelds (Table 1) and y(q) is the mean vector ﬁeld or — in the case of a

paired test — the vector ﬁeld di↵erence y(q) (see Supplementary Material – Appendix C). Wis the (I⇥I)

sample covariance matrix:

W(q)= 1

J1

0

@

J

X

j=1

⇣yj(q)y(q)⌘⇣yj(q)y(q)⌘>1

A(3)

representing the variances-within and correlations-between vector components across the Jresponses (Sup-

plementary Material - Appendix D).

The notation “SPM{T2}” (Friston et al., 2007) indicates that the test statistic T2varies in continu-

ous time (or space), forming a temporal (or spatial) statistical ‘map’. To clarify: “SPM” refers to the

6

methodology, and “SPM{T2}” to a speciﬁc variable.

2.3.2 Two-sample Hotelling’s T2test (Dataset B)

SPM’s vector ﬁeld analog to the two-sample ttest is the Hotelling’s T2test (Cao and Worsley, 1999):

SPM{T2}⌘T2(q)= J1J2

J1+J2⇣y1(q)y2(q)⌘>

W(q)1⇣y1(q)y2(q)⌘(4)

where subscripts “1” and “2” index the two groups. Here Wis the pooled covariance matrix:

W=1

J1+J22

0

@

J1

X

j=1

(y1jy1)(y1jy1)>+

J2

X

j=1

(y2jy2)(y2jy2)>1

A(5)

where the domain “(q)” is dropped for compactness.

2.3.3 Canonical correlation analysis (Dataset C)

SPM’s vector ﬁeld analog to linear regression is canonical correlation analysis (CCA) (Hotelling, 1936;

Worsley et al., 2004). The goal of CCA is to determine the strength of linear correlation between a set of

predictor variables xj(K-component vectors) and a set of response variables yj(I-component vectors). We

provide a brief technical summary of CCA. An extended discussion is provided as Supplementary Material

(Appendix E).

Following Worsley et al. (2004), the test statistic of interest was the maximum canonical correlation (R),

a single correlation coeﬃcient which varies over q, and which transforms to the Fstatistic via the identity:

SPM{F}⌘F(q)=R(q)J1

1R(q)(6)

To compute R, one must ﬁrst assemble three covariance matrices:

•CXX —the(K⇥K) predictors covariance matrix

•CYY —the(I⇥I) responses covariance matrix

•CXY —the(K⇥I) predictor-response covariance matrix

The maximum canonical correlation (R) is the maximum eigenvalue of the (K⇥K) canonical correlation

matrix (C) (Worsley et al., 2004):

7

C=C1

XXCXY C1

YYC>

XY (7)

An equivalent interpretation is that Ris the maximum correlation coeﬃcient obtainable when the pre-

dictor and response coordinate systems are permitted to mutually rotate. K=2 predictors (running speed

and an intercept) were employed to model the I=3 force vector components of Dataset C.

2.3.4 Statistical inference

To determine the signiﬁcance of the aforementioned test statistic ﬁelds, ﬁrst ﬁeld smoothness was es-

timated from the temporal gradients of the residuals (Friston et al., 2007). Next, given this smoothness,

RFT (Adler and Taylor, 2007) was used to determine the critical test statistic threshold that retained a

family-wise error rate of ↵=0.05 (Cao and Worsley, 1999; Worsley et al., 2004). Last, the probability with

which suprathreshold clusters could have been produced by chance (i.e. by random ﬁelds with the same

temporal smoothness) was calculated using analytic expectation (Cao and Worsley, 1999). In other words,

rather than controlling the false-positive rate at each point in time, we presently controlled the false-positive

rate of the test statistic ﬁeld’s sample-rate invariant topological features (Friston et al., 2007). For additional

details refer to Appendix A (Supplementary Material).

2.3.5 Post hoc scalar ﬁeld SPM

When testing non-directed hypotheses regarding biomechanical vector ﬁelds, we propose that SPM should

be implemented in a hierarchical manner, analogous to ANOVA with post hoc t testing. One should ﬁrst

use SPM to analyze the entire vector ﬁeld y(q), and particular vector components (scalar ﬁeld yi(q)) should

only be tested, in a post hoc manner, if statistical signiﬁcance is reached at the vector-ﬁeld level.

Following vector ﬁeld analyses, pos t hoc tests were conducted on each vector component separately (i.e.

on scalar ﬁelds). For scalar ﬁelds the aforementioned tests (§2.3.1–§2.3.3) reduce to: the paired tstatistic

(Dataset A), the two-sample tstatistic (Dataset B), and the linear regression tstatistic (Dataset C). Each

scalar ﬁeld test produced one SPM{t}, whose signiﬁcance was determined as described above §.2.3.1. To

maintain a family-wise error of ↵=0.05, Sid´ak thresholds (Eqn.1) of p=0.0170, p=0.0051, and p=0.0170 were

used to correct for the I=3, I=10, and I=3 vector components of Datasets A, B, and C, respectively (Table

1). All aforementioned analyses were implemented in Python 2.7 using Enthought Canopy 1.0 (Enthought

Inc., Austin, USA).

8

3 Results

3.1 Dataset A: knee kinematics

The knee appeared to be comparatively more ﬂexed (Fig.2a) and somewhat more externally rotated

(Fig.2c) in the side-shu✏e vs. v-cut tasks, with slightly more abduction at 0% stance (Fig.2b). Statistical

tests on the extracted scalars found signiﬁcant di↵erences between the two tasks for both maximal knee

ﬂexion (t=3.093, p=0.018) and abduction at 0% stance (t=3.948, p=0.006).

SPM vector ﬁeld analysis (Fig.5) found signiﬁcant kinematic di↵erences between the two tasks at ap-

proximately 1%, 10%, 20%, 30-35% and 95-100% stance. Post hoc t tests revealed that the e↵ects over

30-35% and 95-100% stance resulted primarily from increased ﬂexion (p=0.015) and increased external ro-

tation (p=0.004), respectively, in the side-shu✏e vs. v-cut tasks (Fig.6). Apparent discrepancies amongst

vector ﬁeld SPM, scalar ﬁeld SPM, and scalar extraction (both here and in the remainder of the Results),

are addressed in the Discussion.

3.2 Dataset B: muscle forces

Most muscles appeared to exhibit higher forces in PFP vs. Controls over most of stance (Fig.3). Nonethe-

less, none of the statistical tests on the extracted scalars reached signiﬁcance; the medial gastrocnemius force

exhibited the strongest e↵ect (t=2.617, p=0.013), but like the nine other muscles (t<1.91, p>0.063) this failed

to reach the ˇ

Sid´ak signiﬁcance threshold of p=0.0051.

In contrast, SPM vector ﬁeld analyses found signiﬁcance for the entire stance phase (Fig.7; p<0.001).

Post hoc ttests on individual muscle trajectories found signiﬁcantly greater forces in PFP only for the medial

gastrocnemius, and only over scattered time regions (maximum p=0.002) (Fig.8).

3.3 Dataset C: ground reaction forces

Forces appeared to increase systematically with running speed, particularly in the vicinity of 30% and

75% stance (Fig.4). Linear regression found that all three extracted scalars surpassed the ˇ

Sid´ak threshold

for signiﬁcance (p=0.0170); analysis of maximum propulsion, vertical and lateral forces yielded r2=0.951,

0.691 and 0.737, and p=0.00004, 0.001, and 0.006, respectively.

SPM vector ﬁeld analysis (Fig.9) found that GRF was signiﬁcantly correlated with running speed in three

intervals with approximate windows of: 10–18%, 20–43% and 60–88% stance. Post hoc scalar ﬁeld analysis

9

revealed that GRFxwas primarily responsible for the 10–18% and 60–88% e↵ects (Fig.10a), and that GRFy

was primarily responsible for the 20–43% e↵ect (Fig.10b).

4 Discussion

The current vector ﬁeld SPM and scalar extraction results all agreed qualitatively with the data, yet the

two approaches yielded di↵erent results and even incompatible statistical conclusions. This, by deﬁnition,

indicates that at least one of the methods is biased. For non-directed hypotheses testing we contend that

scalar extraction is susceptible to two non-trivial bias sources:

1. Post hoc regional focus bias — Type I or Type II error (i.e. false positives or false negatives) resulting

from the failure to consider the entire measurement domain.

2. Inter-component covariation bias — Type I or Type II error resulting from the failure to consider the

covariance amongst vector components.

We further contend that vector ﬁeld testing overcomes both bias sources because it uses the entire

measurement domain and all vector components to maintain a constant error rate of ↵. The remainder of

the Discussion is devoted to justifying these claims.

4.1 Bias in scalar extraction analyses

Dataset A exhibited Type I error due to post hoc regional focus bias. Scalar extraction analysis of

maximum ﬂexion (at ⇠50% stance) reached signiﬁcance (§3.1) but neither vector ﬁeld analysis (Fig.5) nor

post hoc scalar ﬁeld analysis (Fig.6a) reached signiﬁcance in this ﬁeld region. Similarly, scalar extraction

found a signiﬁcant ab-/adduction e↵ect at 0% stance, but SPM did not. These discrepancies are resolved

through multiple comparisons theory (Knudson, 2009); it is highly likely that at least one of Dataset A’s

303 vector ﬁeld points will exceed an (uncorrected) threshold of p=0.05 simply by chance. By extracting

only scalars which appeared to exhibit maximum e↵ects (Fig.2) we e↵ectively conducted 303 tests and then

chose to report the results of only two.

The opposite e↵ect (Type II error) was also present in Dataset A. Scalar extraction focussed on only two

scalars, and thus failed to identify the other e↵ects present in the dataset (Fig.5), and in particular the large

late-stance internal/external rotation e↵ect (Fig.6c). A simple example (Supplementary Material – Appendix

10

A) clariﬁes how it is possible for scalar extraction and SPM to yield opposite statistical conclusions, and

that the scalar extraction results can’t be trusted because they fail to honor the ↵error rate.

Dataset B exhibited Type II error due to covariation bias: scalar extraction failed to reach signiﬁcance

(§3.2) even though SPM found substantial evidence for muscular di↵erences between Controls and PFP

(Fig.7). This is resolved by correlation amongst muscles like the vasti (Fig.3). A simple example (Supple-

mentary Material – Appendix B) clariﬁes how it is possible for vector resultant changes to reach signiﬁcance

when vector component changes do not. Scalar analysis of vector data cannot be trusted because it fails to

account for vector component covariance.

Scalar extraction analysis of Dataset C exhibited both Type I and Type II error due to regional focus

bias. Scalar extraction analyses of lateral forces exhibited Type I error because there is insuﬃcient ﬁeld-wide

evidence to support its conclusion of signiﬁcance (Fig.10c). Scalar extraction also exhibited Type II error by

failing to analyze braking forces and therefore failing to identify positive correlation between running speed

and braking forces at 15% stance (Fig.10a).

4.2 Bias in scalar ﬁeld SPM analyses

Scalar ﬁeld SPM solves regional focus bias (because it tests the entire domain q), but it remains susceptible

to covariance bias because it separately tests the Ivector components. Scalar ﬁeld analysis of Dataset

A exhibited Type II error by failing to identify all ﬁeld e↵ects, and particularly the large early-stance

e↵ect (Fig.5,6). Appendix B clariﬁes that this was caused by scalar ﬁeld analysis’ failure to consider inter-

component covariance; it regards trajectory variance as a 1D time-varying ‘cloud’ (Fig.2) when in fact it is

an I-D time-varying hyper-ellipsoid (Fig.1) representing both within- and between-component (co)variance.

In Dataset B, vector ﬁeld analysis reached signiﬁcance (Fig.7), so scalar ﬁeld analysis, had it not been

conducted in a post hoc manner, would have exhibited Type II error by underestimating the temporal scope

of e↵ects (Fig.8). This is also explained by covariance (Appendix B); the e↵ect was manifested more strongly

in the resultant 10-component muscle force vector than it was in each muscle independently.

In Dataset C vector ﬁeld e↵ect timing (Fig.9) agreed with scalar ﬁeld e↵ect timing (Fig.10), so the latter

would not have been biased had they not been conducted in a post hoc sense. Nevertheless, by failing to

consider covariance the scalar ﬁeld results fail to capture the full temporal extent of the vector e↵ects.

A separate but notable trend was that Dataset C’s covariance ellipses all tended to be narrow and to point

toward the origin (Fig.1). This suggests that vector magnitude was far more variable than vector direction.

A plausible mechanical explanation is friction: to avoid slipping normal forces must increase when tangential

11

forces increase. Regardless of the mechanism, this observation reinforces our contention that non-directed

hypotheses must consider vector changes.

4.3 SPM’s solution to regional focus and covariance bias

SPM solves both regional focus bias and covariance bias by considering the covariance of all vector com-

ponents (i) across the entire measurement domain (q), while simultaneously handling the inherent problem

of multiple comparisons (Knudson, 2009) in a theoretically robust manner. Speciﬁcally, SPM uses a RFT

correction (Adler and Taylor, 2007; Worsley et al., 2004) to ensure that no more than ↵% of the points in the

(I⇥Q) vector ﬁeld reach signiﬁcance simply by chance; this RFT correction is embodied in the thresholds

depicted in Figs.5-10.

Non-RFT corrections like the ˇ

Sid´ak correction (Eqn.1) can partially solve the problem of multiple compar-

isons, but only partially because they fail to consider the (spatiotemporal) smoothness of the measurement

domain q, and therefore overestimate the number of independent tests. This ultimately leads to an overly

conservative threshold (i.e. inﬂated Type II error rate) except for very rough ﬁelds (Friston et al., 2007).

Non-RFT corrections also fail to solve covariance bias because they assume that vector components vary

independently (Supplementary Material – Appendix B). While covariance bias could partially be solved with

a principal axis rotation prior to statistical testing (Cole et al., 1994; Knudson, 2009); we’d argue that: (i)

Hotelling’s T2and CCA are simpler solutions because their results are identical for all coordinate system

deﬁnitions, and (ii) principal axis rotations of only the response vectors do not necessarily maximize the

mutual correlation between predictors and responses (§2.3.3).

We acknowledge that many additional important sources of bias exist (James and Bates, 1997), (Mullineaux

et al., 2001), (Knudson, 2009). However, none of these is unique to SPM. Trajectory mis-registration (Sadeghi

et al., 2003) and unit normalization (e.g. absolute vs. relative muscle forces), for example, pose common

problems to scalar extraction and vector ﬁeld analyses. We contend only that SPM addresses two bias

sources.

4.4 SPM generalizability

Although SPM was originally developed to analyze 3D brain function (Friston et al., 2007), it has been

shown that SPM is generalizable to a variety of biomechanical scalar datasets including 1D trajectories

(Pataky, 2012), 2D pressure ﬁelds and 3D strain ﬁelds (Pataky, 2010). The current study is the ﬁrst, in

any scientiﬁc ﬁeld, to have shown that SPM is also applicable to a large class of practical 1D vector ﬁeld

12

problems. SPM theory suggests that generalizations to biomechanical vector/tensor ﬁelds in nD spaces are

also possible (Xie et al., 2010).

SPM encompasses the entire family of parametric hypothesis testing (Worsley et al., 2004; Friston et al.,

2007). It also accommodates all non-parametric variants (Nichols and Holmes, 2002; Lenho↵et al., 1999),

which may be useful if one’s data do not adhere to the parametric assumption that the residuals are normally

distributed (Friston et al., 2007). This hypothesis testing generalization is apparent when one considers the

following hierarchy: vector ﬁeldCCA simpliﬁes to the vector ﬁeld Hotelling’s T2test when the predictors are

binary (Worsley et al., 2004), which in turn simpliﬁes to scalar ﬁeld ttests when there is only one vector

component i, which in turn simpliﬁes to the univariate Student’s ttest when the scalar ﬁeld reduces to

a single point q. Thus SPM, through CCA, generalizes to all statistical tests of Idimensional vectors on

arbitrarily sized ﬁelds Qof arbitrary dimensionality (Worsley et al., 2004).

For readers interested in implementing SPM analyses, we note that constructing test statistic trajectories

is straightforward; it is trivial to combine mean and standard deviation trajectories to form an SPM{t}, for

example. The non-trivial step is statistical inference. As a ﬁrst approximation it is easy to implement a

ˇ

Sid´ak correction (Eqn.1), which will (very) conservatively reduce the Type I error rate, but which will also

unfortunately inﬂate the Type II error rate. For more precise control of both error rates (via RFT) the

reader is directed to the literature (Friston et al., 2007) and open source software packages (Pataky, 2012).

4.5 Conclusions

Ad hoc reduction of vector trajectories through scalar extraction can non-trivially bias non-directed

biomechanical hypothesis testing, most notably via regional focus and coordinate system bias sources. This

paper shows that SPM overcomes both sources of bias by treating the vector ﬁeld as the fundamental, initially

indivisible unit of observation. Grounded in random ﬁeld theory, SPM appears to be a useful, generalized

tool for the analysis of often-complex biomechanical datasets.

Acknowledgments

Financial support for this work was provided in part by JSPS Wakate B Grant#22700465.

13

Conﬂict of Interest

The authors report no conﬂict of interest, ﬁnancial or otherwise.

References

Adler, R. J. and Taylor, J. E. 2007. Random Fields and Geometry, Springer-Verlag, New York.

Besier, T. F., Fredericson, M., Gold, G. E., Beaupre, G. S., and Delp, S. L. 2009. Knee muscle forces during walking and running

in patellofemoral pain patients and pain-free controls, Journal of Biomechanics 42(7), 898–905, data: https://simtk.org/

home/muscleforces.

Cao, J. and Worsley, K. J. 1999. The detection of local shape changes via the geometry of Hotelling’s T2 ﬁelds, Annals of

Statistics 27(3), 925–942.

Cole, D. A., Maxwell, S. E., Arvey, R., and Salas, E. 1994. How the power of MANOVA can both increase and decrease as a

function of the intercorrelations among the dependent variables., Psychological Bulletin 115(3), 465–474.

Dorn, T. T., Schache, A. G., and Pandy, M. G. 2012. Muscular strategy shift in human running: dependence of running speed

on hip and ankle muscle performance., Journal of Experimental Biology 215, 1944–1956, data: https://simtk.org/home/

runningspeeds.

Friston, K. J., Ashburner, J. T., Kiebel, S. J., Nichols, T. E., and Penny, W. D. 2007. Statistical Parametric Mapping: The

Analysis of Functional Brain Images, Elsevier/Academic Press, Amsterdam.

Hotelling, H. 1936. Relations between two sets of variates, Biometrika 28(3), 321–377.

James, C. R. and Bates, B. T. 1997. Experimental and statistical design issues in human movement research, Measurement in

Physical Education and Exercise Science 1(1), 55–69.

Jensen, R. H. and Davy, D. T. 1975. An investigation of muscle lines of action about the hip: A centroid line approach vs the

straight line approach, Journal of Biomechanics 8(2), 103–110.

Knudson, D. 2009. Signiﬁcant and meaningful e↵ects in sports biomechanics research, Sports Biomechanics 8(1), 96–104.

Kutch, J. J. and Valero-Cuevas, F. J. 2011. Muscle redundancy does not imply robustness to muscle dysfunction, Journal of

Biomechanics 44(7), 1264–1270.

Lenho↵, M. W., Santer, T. J., Otis, J. C., Peterson, M. G., Williams, B. J., and Backus, S. I. 1999. Bootstrap prediction and

conﬁdence bands: a superior statistical method for analysis of gait data, Gait and Posture 9, 10–17.

Mullineaux, D. R., Bartlett, R. M., and Bennett, S. 2001. Research design and statistics in biomechanics and motor control,

Journal of Sports Sciences 19(10), 739–760.

Neptune, R. R., Wright, I. C., and van den Bogert, A. J. 1999. Muscle coordination and function during cutting movements,

Medicine & Science in Sports & Exercise 31(2), 294–302, data: http://isbweb.org/data/rrn/.

Nichols, T. E. and Holmes, A. P. 2002. Nonparametric permutation tests for functional neuroimaging a primer with examples,

Human Brain Mapping 15(1), 1–25.

14

Pataky, T. C. 2010. Generalized n-dimensional biomechanical ﬁeld analysis using statistical parametric mapping, Journal of

Biomechanics 43(10), 1976–1982.

Pataky, T. C . 2012. One-dimensional statistical parametric mapping in Python, Computer Methods in Biomechanics and

Biomedical Engineering 15(3), 295–301.

Rayner, J. M. 1985. Linear relations in biomechanics: the statistics of scaling functions, Journal of Zoology 206(3), 415–439.

Sadeghi, H., Mathieu, P. A., Sadeghi, S., and Labelle, H. 2003. Continuous curve registration as an intertrial gait variability

reduction technique, IEEE Transactions on Neural Systems and Rehabilitation Engineering 11(1), 24–30.

Wan g , X., Ve rri e st, J . P., Le b ret on- G ade gbek u , B., Te ssi e r, Y . , and Tr a sb o t , J. 20 00. E x p er i men tal i nv est iga t ion a nd bi o me-

chanical analysis of lower limb movements for clutch pedal operation, Ergonomics 43(9), 1405–1429.

Wor s ley, K . J ., Ta ylo r , J. E. , Toma i uol o, F. , a nd L e rch , J. 20 04. U n iﬁe d uni va r ia t e and m ult ivari ate r a ndo m ﬁel d t heo ry,

NeuroImage 23, S189–S195.

Xie, Y., Vemuri, B. C., and Ho, J. 2010. Statistical analysis of tensor ﬁelds, Medical Image Computing and Computer-Assisted

Intervention 13(1), 682–698.

15

Table 1: Dataset and scalar extraction overview. I,J,Qand Nare the numbers of vector com-

ponents, responses, time points, and extracted scalars, respectively. For vector ﬁeld analyses,

post hoc scalar ﬁeld analyses, and extracted scalar analyses we conducted one, Iand Ntests,

respectively. ˇ

Sid´ak thresholds of p=0.0253, p=0.0170 and p=0.0051 maintained a family-wise

error rate of ↵=0.05 across 2, 3, and 10 tests, respectively (see Eqn.1).

I J Q N Extracted scalars

Dataset A 3 8 101 2 (1) Max. ﬂexion (at ⇠50% stance)

(2) Ad-abduction at 0% stance

Dataset B 10 43 100 10 Max. force for each muscle (J1=16, J2=27)

Dataset C 3 8 100 3

(1) Max. propulsion force (GRFx,⇠75% stance)

(3) Max. vertical force (GRFy,⇠30–50% stance)

(3) Max. lateral force (GRFz,⇠15% stance)

FIGURES

Figure 1. Vector field schematic: a two-component vector varying in time. Depicted are mean

ground reaction force (GRF) vectors F = [Fx Fy]T from one subject during running (Dorn et al.,

2012), where +x and +y represent the anterior and vertical directions, respectively. These vectors,

when projected on the (Time, Fx) and (Time, Fy) planes, produce common GRF plots (see Fig.

4a,b); here vertical dotted lines depict standard deviation ‘clouds’. When F is projected on the

(Fx, Fy) plane these standard deviations are revealed to arise from covariance ellipses, where

ellipse orientation indicates the direction of maximum covariance between Fx and Fy (see

Supplementary Material - Appendix B).

Figure 2. Dataset A (Neptune et al., 1999) depicting knee kinematics in side-shuffle vs. v-cut

tasks. Cross-subject mean trajectories with standard deviation clouds (dark: side-shuffle, light: v-

cut) are depicted. Each of the eight subjects has three (scalar) trajectories yi(q) for each task, and

these were combined into a single (I=3, Q=101) vector field y(q) for each subject and each task.

Figure 3. Dataset B (Besier et al., 2009) depicting muscle forces during walking in Control vs.

Patello-Femoral Pain (PFP) subjects; 16 and 27 subjects, respectively. Cross-subject mean

trajectories with standard deviation clouds (dark: Control, light: PFP). These ten scalar

trajectories were combined into a single (I=10, Q=100) vector field y(q) for each subject.

Figure 4. Dataset C (Dorn et al., 2012) depicting ground reaction forces (GRF) during running/

sprinting at various speeds. Single-subject cross-trial means; standard deviation clouds are not

depicted in interest of visual clarity. These data form one (I=3, Q=100) vector field y(q) for each

trial.

Figure 5. Dataset A, Hotelling’s T2 trajectory (SPM{T2}). The horizontal dotted line indicates the

critical random field theory threshold of T2 = 29.39.

Figure 6. Dataset A, post hoc scalar field t tests (SPM{t}), depicting where side-shuffle angles

were greater (+) and less (-) than v-cut angles. At a Sidak threshold of p=0.017 (Eqn.1), the thin

dotted lines indicate the critical RFT thresholds for significance: |t| > 4.52, 5.24, 5.26 for (a), (b),

and (c) respectively. The thresholds are different because each vector component has different

temporal smoothness (Fig.2); less smooth trajectories have higher thresholds because there are

more ‘processes’ present between 0 and 100% time. Probability (p) values indicate the likelihood

with which each suprathreshold cluster is expected to have been produced by a random field

process with the same temporal smoothness.

Figure 7. Dataset B, Hotelling’s T2 trajectory (SPM{T2}), depicting where muscle forces differed

between Controls and PFP. The horizontal dotted line indicates the critical RFT threshold of T2 =

9.35. The entire trajectory has exceeded the threshold, so the single suprathreshold cluster has a

very low p value.

Figure 8. Dataset B, post hoc scalar trajectory t tests (SPM{t}), depicting where Control forces

were greater than (+) and less than (-) PFP forces. Thin dotted lines indicate the critical RFT

thresholds for significance.

Figure 9. Dataset C, canonical correlation analysis results, with SPM{F} depicting where ground

reaction forces were correlated with running speed. Critical RFT threshold: F = 38.1.

Figure 10. Dataset C, post hoc scalar trajectory linear regression tests (SPM{t}), depicting the

strength of positive (+) and negative (-) correlation between ground reaction forces (GRF) and

running speed.

Appendix A. Scalar extraction vs. scalar ﬁeld statistics

The purpose of this Appendix is to demonstrate how scalar extraction can bias non-directed

hypothesis testing. To this end we developed and analyzed an arbitrary dataset (Fig.S1). We

caution readers that we have constructed these data speciﬁcally to demonstrate particular con-

cepts. The reader is therefore left to judge the relevance of this discussion to real (experimental)

datasets.

The speciﬁc goal of this Appendix is to scrutinize the similarities and di↵erences between:

(a) a typical univariate two-sample ttest, and (b) a scalar ﬁeld two-sample ttest.

Consider the simulated scalar ﬁeld dataset in Fig.S1. In Fig.S1a, arbitrary true mean ﬁelds

are deﬁned for two experimental conditions: “Cond A” and “Cond B”. The Cond B mean was

produced using a half sine cycle. The Cond A mean was produced by adding a small Gaussian

pulse (at time= 85%) to the Cond B mean. This Gaussian pulse is evident in the true mean

ﬁeld di↵erence (Fig.S1b).

Figure S1: Simulated scalar ﬁeld dataset depicting two experimental conditions: “Cond A”

and “Cond B” (arbitrary units).

We next simulate smooth random ﬁelds: ﬁve for each condition (Fig.S1c). These random

ﬁelds were constructed by generating ten ﬁelds, each containing 100 random, uncorrelated

and normally distributed numbers, then smoothing them using a Gaussian kernel. Adding

the random ﬁelds to the true ﬁeld means (Fig.S1a) produced the ﬁnal simulated responses

(Fig.S1d). For interpretive convenience, let us assume that these data represent joint ﬂexion.

Imagine next that we wish to test the following (non-directed) null hypothesis: “Cond A

and Cond B yield identical kinematics”. Consider ﬁrst scalar extraction: after observing the

data (Fig.S1d) one might decide to extract and analyze the maximum ﬂexion, which occurs

near time = 50%:

yA=⇥100.0 91.2 92.2 95.5 97.1⇤

yB=⇥97.2 101.9 104.8 106.3 111.7⇤

A two-sample ttest on these data yields: t=3.16, p=0.013. We would reject the null

hypothesis at ↵=0.05, and we would conclude that Cond B produces signiﬁcantly greater

maximal ﬂexion than Cond A.

An alternative is to use Statistical Parametric Mapping (SPM) (Fig.S2). The SPM pro-

cedures are conceptually identical to univariate procedures (Table S1). The only apparent

di↵erence is that SPM uses a di↵erent probability distribution (Steps 4 and 5). This probabil-

ity distribution is actually not di↵erent because it reduces to the univariate distribution when

Q=1 (i.e. if there is only one time point).

SPM results ﬁnd signiﬁcant di↵erences between the two conditions near time = 85% (Fig.S2d).

We would therefore reject our null hypothesis, with the caveat that signiﬁcant di↵erences were

only found near time = 85%.

Although univariate ttesting and SPM ttesting are conceptually identical, they have yielded

(e↵ectively) opposite results. The univariate test found signiﬁcantly greater maximal ﬂexion

in Cond B, but SPM found signiﬁcantly greater ﬂexion in Cond A (near time=85%).

Table S1: Comparison of computational steps for univariate and SPM two-sample ttests

(“st.dev.” = standard deviation).

Step (a) Univariate two-sample ttest (b) SPM two-sample ttest Figure

1 Compute mean values yAand yB. Compute mean ﬁelds yA(q) and yB(q) S2(b)

2 Compute st.dev. values sAand sB. Compute st.dev. ﬁelds sA(q) and sB(q) S2(b)

3 Compute the ttest statistic:

t=yByA

q1

J(s2

A+s2

B)

Compute the ttest statistic ﬁeld:

SPM{t}⌘t(q)= yB(q)yA(q)

q1

J(s2

A(q)+s2

B(q))

S2(c)

4 Conduct statistical inference. First use ↵

and the univariate tdistribution to com-

pute tcritical.Ift>t

critical, then reject null

hypothesis.

Conduct statistical inference. First use ↵

and the random ﬁeld theory tdistribution

to compute tcritical.IfSPM{t}exceeds

tcritical, then reject null hypothesis for the

suprathreshold region(s).

S2(d)

5 Compute exact pvalue using tand the uni-

variate tdistribution.

Compute exact pvalue(s) for each

suprathreshold cluster using cluster size

and random ﬁeld theory distribution(s) for

SPM{t}topology.

S2(d)

Figure S2: Scalar ﬁeld analysis using Statistical Parametric Mapping (SPM). In panel (d)

the thin dotted lines depict the critical random ﬁeld theory threshold of |tcritical |=3.533. The

(incorrect) ˇ

Sid´ak threshold is |tcritical|=5.595.

This discrepancy can be resolved through standard probability theory regarding multiple

comparisons, through a consideration of ‘corrected’ and ‘uncorrected’ thresholds. First consider

conducting one statistical test at ↵=0.05. The choice: “↵=0.05” means that we are accepting

a 5% chance of incorrectly rejecting the null hypothesis, or, equivalently, a 5% chance of a

‘false positive’. If we conduct more than one test, there is a greater-than 5% chance of a false

positive. Speciﬁcally, if we conduct Nstatistical tests, the probability of at least one false

positive is given by the family-wise error rate ↵:

↵=1(1 ↵)N

For N=2 tests, there is a ↵=9.75% chance that at least one test will produce a false positive.

For N=100 tests, ↵=99.4%.

To protect against false positives, and to maintain a constant family-wise error rate of

↵=0.05, we must adopt a corrected threshold. One option is the ˇ

Sid´ak threshold:

pcritical =1(1 ↵)1/N

For N=2 and N=100 tests, the ˇ

Sid´ak thresholds are pcritical=0.0253 and pcritical=0.000513,

respectively.

Herein lies one problem: our scalar extraction analysis has used an uncorrected threshold

of pcritical=0.05. Even though we have formally conducted only one statistical test, the data

were extracted from a dataset that is 100 times as large. Since we observed the data before

choosing which scalar to extract, we e↵ectively conducted N=100 tests, albeit visually, then

chose to focus on only one test. By failing to adopt a corrected threshold, we have biased our

analyses.

Although the ˇ

Sid´ak correction helps to avoid false positives, it is not generally a good choice

because it assumes that there are 100 independent tests (i.e. one for each time point in our

dataset). The points in this dataset are clearly not independent because the curves are smooth,

changing only gradually over time. Thus the ˇ

Sid´ak correction is too severe, lowering ↵well

below 0.05. An overly severe threshold produces the opposite bias: an increased chance of false

negatives.

SPM employs a random ﬁeld theory (RFT) correction to more accurately maintain ↵=0.05.

The precise threshold is based not only on ﬁeld size (Q=100), but also on ﬁeld smoothness —

which is estimated from temporal derivatives. Computational details for RFT corrections are

provided in the SPM literature.

Unfortunately, even if our scalar analysis had employed a corrected threshold, it still would

have been biased, but for a separate reason. By focussing only on maximal ﬂexion (which did

not appear in our null hypothesis), we have neglected to consider the signal at time = 85%,

and have therefore not detected the true ﬁeld di↵erence (Fig.S1a). In contrast, SPM was able

to uncover the true signal because it both adopted a corrected threshold and considered the

entire ﬁeld simultaneously (Fig.S1d).

The aforementioned sources of bias — (1) failing to adopt a corrected threshold, and (2)

failing to consider the entire ﬁeld — are referred to collectively in the main manuscript as

‘regional focus bias’.

Last, we reiterate that this Appendix is relevant only to non-directed hypotheses. If we

had formulated a (directed) hypothesis regarding only maximal ﬂexion — prior to observing

the data — then our scalar extraction analyses would not have been biased because our null

hypothesis would not have pertained to the entire time domain 0–100%.

In summary, regional focus bias can be avoided by:

1. Specifying a directed null hypothesis — before observing the data — and then extracting

only those scalars which are speciﬁed in the null hypothesis.

2. Analyzing the data using SPM or another ﬁeld technique which both considers the entire

temporal domain and which adopts a corrected threshold.

Appendix B. Univariate vs. vector analysis

The purpose of this Appendix is to demonstrate how univariate testing of vector data can

bias non-directed hypothesis testing. To this end we developed and analyzed an arbitrary

dataset (Table S2). As in Appendix A, we caution readers that we have constructed these

data speciﬁcally to demonstrate particular concepts. The reader is therefore left to judge the

relevance of this discussion to real (experimental) datasets.

The speciﬁc goal of this Appendix is to compare and contrast the (univariate) ttest and

its (multivariate) vector equivalent: the Hotelling’s T2test.

Table S2: A simulated dataset exhibiting biased univariate testing. (a) Two-component force

vector responses F=[Fx,Fy]>. (b)-(d) Scalar (univariate) testing. (e)-(g) Vector (multivari-

ate) testing. Sources of bias and further details are discussed in the text. Technical overviews

of covariance matrices (W) and the Hotelling’s T2statistic are provided in Appendix D and

§2.3 (main manuscript), respectively.

Group A Group B Inter-Group

(a) Responses

FA1 = [159, 719]>FB1 = [143, 759]>

FA2 = [115, 762]>FB2 = [172, 734]>

FA3 = [177, 681]>FB3 = [161, 735]>

FA4 = [138, 694]>FB4 = [195, 733]>

FA5 = [98, 697]>FB5 = [168, 706]>

Univariate

(b) Means (Fx)A= 137.4 (Fx)B= 167.8 Fx= 30.4

(Fy)A= 710.6 (Fy)B= 733.4 Fy= 22.8

(c) St.dev. (sx)A= 28.6 (sx)B= 16.8 sx= 23.5

(sy)A= 28.5 (sy)B= 16.8 sy= 23.4

(d) ttests tx=1.832; px=0.104

ty=1.380; py=0.205

Vector

(e) Means FA= [137.4, 710.6]>FB= [167.8, 733.4]>F= [30.4, 22.8]>

(f) Covariance WA="817.8323.2

323.2809.8#WB="283.8131.9

131.9281.8#W="550.8227.6

227.6545.8#

(g) T2test T2=7.113; p=0.028

In Table S2(a) above there are ﬁve force vector responses (F=[Fx,Fy]>) for each of two

groups: “A” and “B”. Their means and standard deviations are shown in Table S2(b)-(c). In

Table S2(d) we see that ttests pertaining to both Fxand Fyfail to reach signiﬁcance; pvalues

are greater than (even an uncorrected) threshold of p=0.05. An adequate interpretation is

that the mean force component changes (Fxand Fy) are not unexpectedly large given their

respective variances (i.e. standard deviations: sxand sy).

We next jump ahead to the ﬁnal results of the vector procedure in Table S2(g): here we

see that the Hotelling’s T2test reached signiﬁcance (p=0.032). An adequate interpretation is

that the mean force vector change (F) was unexpectedly large given its (co)variance (W).

Let us now backtrack and consider why the univariate and vector procedures yield di↵erent

results.

The ﬁrst step of the vector procedure is to compute mean vectors; in Table S2(e) we can

see that the vector means have the same component values as the univariate means from Table

S2(b). However, there is already one critical discrepancy to note: the vector procedure assesses

F,whichistheresultant vector connecting the Group A and Group B means (Fig.S3).

From Pythagoras’ theorem:

|F|2=F2

x+F2

y(B.1)

it is clear that the magnitude of the resultant will always be greater than the magnitude of its

components — except in the experimentally unlikely cases of Fx=0 and/or Fy=0. This is

non-trivial for two reasons. First, since the vector procedure assesses the maximum di↵erence

between the two groups, it is more robust to Type II error than univariate procedures (note:

the univariate tests in Table S2 exhibit Type II error by failing to reach signiﬁcance). Second,

the vector technique’s assessment of di↵erences is independent of the xy coordinate system

deﬁnition; whereas the component e↵ects (Fxand Fy) can change when the xy coordinate

system deﬁnition changes, both the resultant and the variance along the resultant direction will

always have the same magnitude. This may have non-trivial implications for biomechanical

datasets that employ diﬃcult-to-deﬁne coordinate systems (e.g. joint rotation axes).

Figure S3: Graphical depiction of the data from Table S2. Small circles depict individual

responses. Thick colored arrows depict the mean force vectors for the two groups. The thick

black arrow depicts the (vector) di↵erence between the two groups, and thin black lines indicate

its xand ycomponents. The ellipses depict within-group (co)variance; their principal axes (thin

dotted lines) are the eigenvectors of the covariance matrices in Table S2(f ). Here covariance

ellipse radii are scaled to two principal axis standard deviations (to encompass all responses).

The next step of the vector procedure is to compute covariance matrices W(Appendix D).

The diagonal elements of WAand WBin Table S2(f) are simply the variances (i.e. squared

standard deviations) s2

xand s2

yfrom Table S2(c). The o↵-diagonal terms are equal and represent

the covariance (i.e. correlation) between Fxand Fy.IfFxtends to increase when Fyincreases

then the o↵-diagonal terms would be positive, but in this case they are negative, indicating

that Fxtends to decrease when Fyincreases. This tendency can be seen in the raw data (small

circles) in Fig.S3.

The presence of non-zero o↵-diagonal terms thus has a critical implication: changes in Fx

and Fyare not independent. This is critical because univariate tests implicitly assume that Fx

and Fyare independent.

To appreciate this point it is useful to recognize that covariance matrices may be interpreted

geometrically as ellipses: the eigenvectors of Wrepresent the ellipse’s principal axes, and its

eigenvalues represent the variance along each principal direction. This is perfectly analogous to

inertia matrices: the eigenvectors of an inertia matrix deﬁne a body’s principal axes of inertia,

and eigenvalues specify the principal moments of inertia.

The importance of this geometric interpretation becomes clear when visualizing covariance

ellipses. In Fig.S3 we can see that the principal axes of the covariance matrices are not aligned

with the xy coordinate system, implying that changes in Fxand Fyare not independent.

Critically, we can also see that the direction of minimum variance is very similar to the direction

of F. Thus the standard deviations sxand sy(used in the univariate analyses) are larger

than the standard deviation in the direction of F.

In summary, vector statistical testing more objectively detects vector changes because : (a)

it is coordinate system-independent, (b) it considers both the maximum di↵erence between

groups (i.e. the resultant di↵erence) and the variation along this direction. This Appendix has

demonstrated how univariate testing of vector data can lead to Type II error. With a di↵erent

dataset it would also be possible to demonstrate Type I error, but in interest of space we end

here. The most important point, the main paper contends, is that non-directed hypothesis

testing mustn’t assume vector component independence.

Appendix C. Mean vector ﬁeld calculation

An I-component vector ywhich varies over Qpoints in space or time may be regarded as

an (I⇥Q) vector ﬁeld response y(q). Given Jresponses, the mean vector ﬁeld is:

y(q)= 1

J

J

X

j=1

yj(q) (C.1)

For the paired Hotelling’s T2test (Dataset A: §2.3.1, main manuscript), one must ﬁrst

compute pairwise di↵erences:

yj(q)=yBj(q)yAj (q) (C.2)

where “A” and “B” represent the two tasks (v-cut and side-shu✏e) and jindexes the subjects.

A paired Hotelling’s T2test is thus equivalent to a one-sample Hotelling’s T2test conducted

on the pairwise di↵erences y(q). The same is true in the univariate case: a paired ttest is

equivalent to a one-sample ttest on pairwise di↵erences.

Appendix D. Covariance matrices

Although the concepts presented below apply identically to vector ﬁelds, for brevity present

discussion is limited to simple vectors.

Consider a two-component force vector response F:

Fj=⇥Fxj Fyj⇤>(D.1)

where jindexes the responses, and there are a total of Jresponses. After computing the mean

force vector Fas:

F=

2

6

6

4

Fx

Fy

3

7

7

5

=1

J

J

X

j=1

Fj(D.2)

the covariance matrix Wcan be assembled as follows:

W=

2

6

6

4

Wxx Wxy

Wyx Wyy

3

7

7

5

(D.3)

where the elements of Ware:

Wxx =1

J1

J

X

j=1

(Fxj Fx)2(D.4)

Wyy =1

J1

J

X

j=1

(Fyj Fy)2(D.5)

Wxy =Wyx =1

J1

J

X

j=1

(Fxj Fx)(Fyj Fy) (D.6)

Thus the diagonal elements Wxx and Wyy are the intra-component variances (i.e. squared

standard deviations), and the o↵-diagonal elements Wxy and Wyx are the inter-component

covariances between Fxand Fyover multiple responses. Importantly, changes in Fxand Fyare

completely uncorrelated if and only if Wxy=0.

One contention of this paper is that separate (univariate) analysis of Fxand Fyis biased

when testing non-directed hypotheses. The main reason is that Fxanalysis considers only Wxx

and Fyanalysis considers only Wyy. This is equivalent to assuming Wxy=0, an assumption

which may not be valid (Appendix B).

A geometric interpretation of Wis useful both for visualizing vector variance (Fig.S3) and

for appreciating canonical correlation analysis (Appendix E). Consider that Wrepresents an

ellipse whose geometry is deﬁned by the solutions to the eigenvalue problem:

Wv =v(D.7)

Here vand are the eigenvectors and eigenvalues, respectively, and there are two unique

eigensolutions unless both (Wxx =Wyy) and (Wxy = 0), in which case there is only one

eigensolution and Wrepresents a circle. When there are two solutions the eigenvectors repre-

sent the ellipse axes (or equivalently: principal axes), and the eigenvalues represent the axes’

lengths (or variance in the direction of the principal axes). An equivalent interpretation is that

one eigenvector of Wrepresents the direction of maximum variance within the dataset. This

means that we can rotate our original coordinate system xy to a new coordinate system x0y0so

that variance along the new x0axis is the maximum possible variance obtainable for all possible

x0.

Appendix E. Canonical correlation analysis (CCA)

CCA aims to quantify the amount of variance that a multivariate predictor (i.e. vector)

Xcan explain in a multivariate response Y. One type of CCA useful for hypothesis testing is

to ﬁnd the maximum possible correlation coeﬃcient that can be obtained when the coordinate

systems deﬁning Xand Yare permitted to mutually rotate.

Consider a response variable Ythat describes three orthogonal force components F:

Yj=⇥F1jF2jF3j⇤>(E.1)

where “1”, “2” and “3” represent orthogonal axes and where jindexes a total of Jresponses.

Next consider a predictor variable Xthat describes the rotations ✓about two orthogonal axes

at a given joint:

Xj=⇥✓1j✓2j⇤>(E.2)

where “1” and “2” indicate the two joint axes. The relevant null hypothesis is: Xand Yare

not linearly related.

To test this hypothesis one needs to assemble three covariance matrices. The ﬁrst is a (3

⇥3) response covariance matrix WYY which describes variance within and the co-variation

between the three force components (see Appendix D). The second is a (2 ⇥2) predictor

covariance matrix WXX which describes the variance and covariance of the two joint angles.

The third is a (2 ⇥3) predictor-response covariance matrix WXY which describes how each of

the predictor variables co-varies with each of the response variables.

The predictor-response covariance matrix WXY is relevant to the null hypothesis because it

embodies the strength of linear correlation between Xand Y. For completion, in the example

above WXY has six elements, corresponding to:

1. The linear correlation between ✓1and F1

2. The linear correlation between ✓1and F2

3. The linear correlation between ✓1and F3

4. The linear correlation between ✓2and F1

5. The linear correlation between ✓2and F2

6. The linear correlation between ✓2and F3

Initially these correlations refer only to X’s and Y’s original coordinate systems. Since

arbitrary coordinate systems can bias non-directed hypothesis testing (Appendix B), we must

allow the coordinate systems to rotate in order to most objectively test our null hypothesis.

One CCA solution is to choose the Xand Ycoordinate systems that mutually maximize

a single correlation coeﬃcient. The logic is that all other coordinate systems underestimate

correlation strength. In other words, as the coordinate systems rotate the elements of WXY

change, and one (not necessarily unique) coordinate system combination maximizes an element

of WXY . CCA solves this problem eﬃciently using the maximum eigenvalue of the canonical

correlation matrix (Eqn.7, main manuscript).

As an aside, we note that the K=2 model in the main manuscript is equivalent to a K=1

model (i.e. only a running speed regressor) because only one (diagonal) element of WXX is

non-zero. For generalizability the main manuscript treats CCA in its K>1 form.