ArticlePDF Available

Identifying direct and indirect associations among traits by merging phylogenetic comparative methods and structural equation models

Wiley
Methods in Ecology and Evolution
Authors:
  • Alaska Fisheries Science Center

Abstract and Figures

Traits underlie organismal responses to their environment and are essential to predict community responses to environmental conditions under global change. Species differ in life‐history traits, morphometrics, diet type, reproductive characteristics and habitat utilization. Trait associations are widely analysed using phylogenetic comparative methods (PCM) to account for correlations among related species. Similarly, traits are measured for some but not all species, and missing continuous traits (e.g. growth rate) can be imputed using ‘phylogenetic trait imputation’ (PTI), based on evolutionary relatedness and trait covariance. However, PTI has not been available for categorical traits, and estimating covariance among traits without ecological constraints risks inferring implausible evolutionary mechanisms. Here, we extend previous PCM and PTI methods by (1) specifying covariance among traits as a structural equation model (SEM), and (2) incorporating associations among both continuous and categorical traits. Fitting a SEM replaces the covariance among traits with a set of linear path coefficients specifying potential evolutionary mechanisms. Estimated parameters then represent regression slopes (i.e. the average change in trait Y given an exogenous change in trait X) that can be used to calculate both direct effects (X impacts Y) and indirect effects (X impacts Z and Z impacts Y). We demonstrate phylogenetic structural‐equation mixed‐trait imputation using 33 variables representing life history, reproductive, morphological, and behavioural traits for all >32,000 described fishes worldwide. SEM coefficients suggest that one degree Celsius increase in habitat is associated with an average 3.5% increase in natural mortality (including a 1.4% indirect impact that acts via temperature effects on the growth coefficient), and an average 3.0% decrease in fecundity (via indirect impacts on maximum age and length). Cross‐validation indicates that the model explains 54%–89% of variance for withheld measurements of continuous traits and has an area under the receiver‐operator‐characteristics curve of 0.86–0.99 for categorical traits. We use imputed traits to classify all fishes into life‐history types, and confirm a phylogenetic signal in three dominant life‐history strategies in fishes. PTI using phylogenetic SEMs ensures that estimated parameters are interpretable as regression slopes, such that the inferred evolutionary relationships can be compared with long‐term evolutionary and rearing experiments.
This content is subject to copyright. Terms and conditions apply.
Methods Ecol Evol. 2023;14:1259–1275.
|
1259wileyonlinelibrary.com/journal/mee3
Received: 14 June 2022 
|
Accepted: 30 Janu ary 20 23
DOI : 10.1111/20 41-210X .14076
RESEARCH ARTICLE
Identifying direct and indirect associations among traits by
merging phylogenetic comparative methods and structural
equation models
James T. Thorson1| Aurore A. Maureaud2,3,4 | Romain Frelat5| Bastien Mérigot6|
Jennifer S. Bigman7| Sarah T. Friedman8,9 | Maria Lourdes D. Palomares10 |
Malin L. Pinsky4| Samantha A. Price11 | Peter Wainwright8
1Habitat and Ecolo gical Processes Resea rch, Al aska Fis heries Science Center, Seattle, Washington, USA ; 2Depar tment of Ecolog y & Evolutionary Biolog y, Yale
University, New Haven, Connecticut, USA; 3Center for Biodiversity & Glo bal Change, Yale Universit y, New Haven, Connec ticut , USA; 4Department of Ecology,
Evolution, and Nat ural Resources, Rutgers Unive rsity, New Brunswick, New Jersey, USA; 5Aquaculture and Fish eries G roup, Wage ningen University & Research
(WUR), Wageningen, The N etherlands; 6M ARBEC , Univer sité de Montpellier, CNRS, IFREMER , IRD, Sète, France; 7Recruit ment Pro cesses Progra m, Alaska
Fisheries Scie nce Center, NOAA Fis heries, Seat tle, Washingto n, USA; 8D epar tment of Evolution a nd Ecology, Univer sity of C alifor nia Davis , Davis, C alifornia,
USA; 9Cur rent address: G roundf ish Assessment Progr am, Alaska Fisheries Science C enter, Seat tle, Was hington, USA; 10Sea Around Us, Institute for the Oceans
and Fisheries, University of British C olumbi a, Vancouver, British Columb ia, Canada and 11 Depar tment of Biological Sciences, Clemson Univer sity, Cle mson,
South C arolin a, USA
This is an op en access arti cle under the ter ms of the Creative Commons Attribution-NonCommercial License , which permits use, dis tribu tion and reprod uction
in any medium, provided the original work is properl y cited an d is not use d for comm ercial purposes.
© 2023 The Authors . Methods in Ecolog y and Evolution published by John Wiley & S ons Ltd on behalf of British Ecologic al Society. This articl e has been
contrib uted to by U. S. Government e mployees and their work is in the pub lic doma in in the US A.
Correspondence
James Thorson
Email: james.thorson@noaa.gov
Funding information
Nationa l Science Foundation, Grant/
Award Number: 2109411 and DEB-
1556953
Handling Editor: Arthur Porto
Abstract
1. Traits underlie organismal responses to their environment and are essential to
predict community responses to environmental conditions under global change.
Species differ in life- history traits, morphometrics, diet type, reproductive char-
acteristics and habitat utilization.
2. Trait associations are widely analysed using phylogenetic comparative methods
(PCM) to account for correlations among related species. Similarly, traits are
measured for some but not all species, and missing continuous traits (e.g. growth
rate) can be imputed using ‘phylogenetic trait imputation’ (PTI), based on evo-
lutionary relatedness and trait covariance. However, PTI has not been available
for categorical traits, and estimating covariance among traits without ecological
constraints risks inferring implausible evolutionary mechanisms.
3. Here, we extend previous PCM and PTI methods by (1) specifying covariance
among traits as a structural equation model (SEM), and (2) incorporating associa-
tions among both continuous and categorical traits. Fitting a SEM replaces the
covariance among traits with a set of linear path coefficients specifying poten-
tial evolutionary mechanisms. Estimated parameters then represent regression
slopes (i.e. the average change in trait Y given an exogenous change in trait X) that
can be used to calculate both direct effects (X impacts Y) and indirect effects (X
impacts Z and Z impacts Y).

|

Methods in Ecology and Evoluon
THORSON et al.
| 
Trait- based approaches are essential for improving our understand-
ing of ecological and evolutionar y processes. For example, they are
used to identify population and community responses to global
change (Pacifici et al., 2017 ), community assembly rules (Gross
et al., 2021; Legras et al., 2019), and predict how changes in com-
munity diversity affect ecosystem functioning (Díaz et al., 2013) and
ecosystem services (Hevia et al., 2017). They can also be used to
test theor y regarding evolutionary mechanisms (Baker et al., 2020)
and support biodiversity conser vation (C ardillo et al., 2008). Traits
of fl oristic and fauna l sp ec ies ca n be quantitati ve (discrete or contin-
uous) and/or qualitative (binary, nominal, or ordinal variables). For in-
stance, continuous traits include growth rates, body or leaf size, and
age at maturity, while categorical traits encompass behaviours (e.g.
solitary or gregarious species), diet (autotroph, heterotroph, mixo-
troph) or reproduction (dispersal modes, guarding vs. nonguarding
young) (Hadj- Hammou et al., 2021; Violle et al., 2007).
Trait values are not available for ever y species of interest,
both due to limited scientific resources and ongoing difficulties
in collecting and/or sharing trait information across taxa and sys-
tems (although see Gallagher et al., 2020). Consequently, there are
many potential methods available to impute these missing trait val-
ues (Azur et al., 2011; Goolsby et al., 2017; Schrodt et al., 2015).
Comparisons of phylogenetic trait imputation (PTI) methods gen-
erally show that performance is improved by including phyloge-
netic information (Debastiani et al., 2021; Penone et al., 20 14;
Taugourdeau et al., 2014), or even using taxonomy as a proxy for
phylogeny (Johnson et al., 2021) wherein related taxa are more likely
to share similar traits than unrelated t axa.
PTI generally involves specifying a statistical process for how
trait values change along a phylogenetic (Goolsby et al., 2017) or
taxonomic tree (Schrodt et al., 2015; Thorson, 2020; Thorson
et al., 2017). This involves estimating parameters to represent cor-
relations
R
among
ng
taxa for a given trait, as well as covariance
𝚺
among
nj
traits. For example, the function phylopars in r- p a c k a g e
Rphylopars is a common implementation for PTI but it cannot be
implemented for categorical traits (Johnson et al., 2021; Penone
et al., 2014), while such traits are generally easier to assess and col-
lect than continuous ones. Additionally, estimating
𝚺
without con-
straints (beyond the requirement that it is symmetric and positive
definite) has three main limitations (Grace, 2006): (1) results cannot
be compared easily with slopes estimated in conventional regres-
sion models, such that results are difficult to interpret or validate
using experimental data; (2) existing methods cannot use evolution-
ary theory and experiment s to specify the structure of covariance
among traits; and (3) the number of parameters in
𝚺
without other
constraints is
nj(
n
j
+1
)
2
, which becomes computationally chal-
lenging to fit when interpolating a large number of traits.
As an alternative to estimating the covariance among traits
directly, we propose to use struc tural equation modelling (SEM)
to specify a parsimonious structure for this trait- covariance
𝚺
.
Given a set of
nj
traits
{
Y1,Y2,,Ynj
}
with measurements
y1,y2,,ynj
, SEM allows the user to specify a set of depen-
dencies linking these, where each dependency is represented by a
path co ef ficient . Thes e lin ks can be int erp ret ed as a gr aph wher ein
4. We demonstrate phylogenetic structural- equation mixed- trait imputation using
33 variables representing life history, reproductive, morphological, and behav-
ioural traits for all >32,000 described fishes worldwide. SEM coefficients sug-
gest that one degree Celsius increase in habitat is associated with an average
3.5% increase in natural mortality (including a 1.4% indirect impact that acts via
temperature effects on the growth coefficient), and an average 3.0% decrease in
fecundity (via indirect impacts on maximum age and length). Cross- validation in-
dicates that the model explains 54%– 89% of variance for withheld measurements
of continuous traits and has an area under the receiver- operator- characteristics
curve of 0.86– 0.99 for categorical traits.
5. We use imputed traits to classify all fishes into life- history types, and confirm a
phylogenetic signal in three dominant life- history strategies in fishes. PTI using
phylogenetic SEMs ensures that estimated parameters are interpretable as re-
gression slopes, such that the inferred evolutionary relationships can be com-
pared with long- term evolutionary and rearing experiments.

evolutionary mechanisms, life history strategies, phylogenetic trait imputation, population
and community ecology, structural equation model, trait- based approach, phylogenetic
comparative methods

|

Methods in Ecology and Evoluon
THORSON et al.
each tra it is a nod e an d linkages are a di re cted edge, su ch tha t e.g.,
Y1
Y2
indicates that a change in trait
Y1
will cause a subsequent
change in
Y2
. The value of path coefficients can then be estimated
as fixed effects by identifying their values that maximize the like-
lihood of data. This use of SEM then allows a user to replace the
nj(
n
j
+1
)
2
parameters in a covariance matrix with any set of pa-
rameters (from 1, up to the maximum of
nj(
n
j
+1
)
2
when not
using Bayesian priors). For example, a trait- imputation model with
nj=30
traits would require estimating 465 parameters for covari-
ance
𝚺
without other constraints, but could be restricted to fewer
important parameters using SEM. Furthermore, SEM can be used
to est imate the correlation between two traits that are connec ted
by a directed edge (‘direct pathways’) or mediated by a third trait
(called ‘indirec t pathways’). In this way, SEM decomposes the
 Conceptual diagram to illustrate trait correlations using two hypothetical examples involving fish or avian responses to
temperature, assuming that temperature affects body size, which in turn affects one continuous and one categorical trait in each example.
Analyses star t by assembling trait measurement s, where values are available for some but not all of six taxa. These conceptual models are
then formalized by specifying a text file listing associations, and this in turn can generates the matrix
𝚪
(for illustration we assume
𝛾
=
0.5
for all associations), and then are used to compute the covariance among traits
𝚺=LLt
, where
L
=
(
I
𝚪)
1
V
0.5
and
V
represents exogenous
covariance (evolutionary drift). For illustration we specify
diag(V)
=
1
and convert the covariance to a correlation matrix, shown for each
taxon. In practice, associations
𝛄
(used to form
𝚪
) and exogenous variances
V
are estimated from the fit to data (rather than specified as
shown here). The covariance
𝚺
is then used to generate a probabilistic prediction of missing trait values for each taxon.

|

Methods in Ecology and Evoluon
THORSON et al.
correlation between two traits into the contribution from both di-
rect and indirect trait effects.
In this study, we extend PTI to (1) incorporate both continuous
and categorical traits, and (2) represent the trait covariance matrix
using SEM, while using a Brownian motion model for simplicit y of
presentation. The approach can be implemented for any traits of
floristic or faunistic species, and either using phylogeny for evolu-
tionary distance or using taxonomy as a prox y for relatedness. To
demonstrate the benefits of our extensions of PTI, we applied the
approach to fishes which have evolutionary trade- offs that are
highly structured by temperature and individual length, while also
having extensive information about a variety of behavioural, repro-
ductive, and life- history traits (Barnett et al., 2019). We specifically
use data for 34 trait s for >32,000 described fishes, obtained by
combining existing in situ trait data (FishBase; Froese, 1990) and
morphometric trait data from National Museum of Natural Histor y
fish specimens (Price et al., 2019, 2022). We interpret results by
computing the direct and indirect impacts of temperature and max-
imum body length on other traits, and using traits to classify fishes
into life- history strategies. Finally, we discuss how phylogenetic im-
putation of mixed traits using SEMs can help to unif y experimental
(micro- evolutionary) and comparative (macro- evolutionary) studies
of life- histor y trade- offs.
| 
We extend existing PTI methods in the following two ways:
1. Structural equation modelling: We model the covariance
𝚺
among
multiple traits using methods derived from SEM. This allows us
to specify a small set of path coefficients, despite conducting
multivariate trait imputation on many traits.
2. Including categorical traits: We fit our phy loge netic mod el to a mi x-
ture of continuous and categorical trait s. Fitting to a categorical
trait with
M
levels involves estimating
M1
latent variables, and
we transform these to the probability of each level using a mul-
tivariate logistic transformation given the constraint that these
probab ili ties sum to one. We th en mode l the association between
these
M1
latent variables and other continuous traits in a way
that permits efficient statistical inference.
We provide further details below (see Supporting Information A
for summary of all notation), and implement the approach in the
package FishliFe releas e 3.0.1 (Th orson, 2023) in the R st atistical en-
vironment (R Core Team, 2021).
 | 
equation modelling
We seek to estimate a vector of traits
𝛃g
for each taxon
g
in a
rooted and additive tree (i.e., including ultrametric phylogenies),
including trait- values for both tips (species) and ancestral nodes as
well as the trait- vector
𝛃0
for the root of the tree. We assume that
evolutio n fo ll ows a st andard model (e. g., Br ownia n moti on , Pa gel's
lambda, etc) t hat can b e ex pressed using a multivariate nor ma l di s-
tribution (Paradis, 2012). This model allows calculating a correla-
tion matrix
R
with dimension
ng×ng
, where
ng
is the total number
of taxa (tips and ancestral nodes), representing the correlation for
a single trait along the phylogeny. We similarly construct the co-
variance
𝚺
among
nj
traits using methods drawn from structural
equation modelling.
This then result s in a separable covariance for
B
containing latent
trait
𝛽g,j
all taxa
g
and traits
j
:
where
R𝚺
is the Kroenecker (‘outer’) product of the correlation
among taxa and covariance among traits,
1
is a vector of 1s with length
ni
such that
1𝛃0
forms the intercept for ever y taxon and trait, and
MVN
is a multivariate normal distribution with these moments. This
separable covariance
R𝚺
can often be imp lemented more ef ficiently
in some software as a conditional or simultaneous autoregressive
model (Ver Hoef et al., 2018), although we present the separable co-
variance here to agree with standard notation in phylogenetic compar-
ative methods (e.g., Paradis, 2012). In the following we only explore
a Brownian motion (a.k.a. random- walk) process for
R
, although fu-
ture software developments could easily generalize this to Ornstein-
Uhlenbeck, Pagel's delta, or other evolutionary models (see Supporting
Information B).
We next introduce how to construct trait covariance
𝚺
using
methods drawn from struc tural equation modelling. We assume that
the user specifies:
1. the structure of a path matrix
𝚪
with dimension
nj×nj
. The
user specifies a priori which elements of this matrix are fixed at
zero or are instead freely estimated as fixed effects (including
cases when multiple path coefficients are constrained to the
same estimated value). For example, specif ying that
𝛾j,j
=0
involves assuming that trait
j
has no direct impact on trait
j
.
2. a Cholesky matrix
S
where
SSt
rep res ent s the cov ari ance in exo ge-
nous variation with dimension
nj×nj
. A t a mi n im um , th i s co v ar i an ce
SSt
involves estimating diagonal entries,
diag
(
S
)
=(
𝜎1
,
𝜎2
,,
𝜎n
j
)
resulting in an independent exogenous variance
𝜎2
j
for each trait
j
(where these can again be constrained to the same estimated
value). However, traits can also have exogenous covariance by
estimating lower- triangle elements of
S
, which then results in
off- diagonal elements for exogenous covariance
V
. Nonzero ele-
ments of
S
are then freely estimated as fixed effects.
This path matrix (and resulting path diagram) is central to struc-
tural equation modelling, which has been reviewed elsewhere for
describing interaction networks and physiological performance
(Frauendorf et al., 2021; Garrido et al., 2022). However, struc-
tural equation models have not to our knowledge been fitted
(1)
vec
(B)
MVN
(
vec
(
1𝛃
0)
,R𝚺
),

|

Methods in Ecology and Evoluon
THORSON et al.
simultaneously with phylogenetic covariance. Previous studies have
either adjusted data or estimated residual covariance based on phy-
logeny and then fitted a SEM to those residuals (Mason et al., 2016;
Santos, 2012) or have fitted a series of phylogenetic linear models
to represent dependencies in a path diagram (van der Bijl, 2018).
Specifying a path diagram requires enumerating the set of variables
(graph vertices) and dependencies (directed edges), where these de-
pendencies can be interpreted as mechanisms for causal inference
(Pearl, 2009). The reliability of causal inference requires correct
specification of the path diagram (Grace & Irvine, 2020), and we
recommend further simulation and case- study evaluation of causal
inference within phylogenetic comparative methods.
These two matrices are then used to solve a simultaneous equa-
tion for
xMVN(0,𝚺)
, i.e., a hypothetical draw from covariance
among traits
𝚺
(Kaplan, 2001):
where
𝚪
represents endogenous mechanisms linking variables and
𝛆
represents exogenous variation with variance
Var(𝛆)=SSt
. We then
solve for the Cholesky of trait covariance as:
where trait covariance
𝚺=LLt=Var(x)
.
Constructing trait covariance
𝚺
=(I𝚪)1SSt
(
I𝚪t
)1
in th is way
generalizes several existing models:
1. Brownian motion: The analyst might specify
𝚪=0
and
S
as a
diagonal matrix, and this then reduces to a standard Brownian
motion model.
2. Phylogenetic path analysis: In some cases, variables can be reor-
dered such that
𝚪
is lower- triangular. In these cases, the model
can be estimated using phylogenetic path analysis, for example
fitted using piecewise SEM or d- separation methods (van der
Bijl, 2018; von Hardenberg & Gonzalez- Voyer, 2013). However,
𝚪
might also include loops, where for example, trait
j1
affect s
j2
,
j2
affects
j3
, and
j3
affects
j1
. This cannot be represented
using standard phylogenetic path analysis but can be using SEM
(Equations 2 and 3).
3. Phylogenetic factor analysis: In other case s, the analys t might spe c-
ify
𝚪=0
and
S
having lower- diagonal entries that are nonzero
for only a few columns. In this case,
𝚺=SSt
where the nonzero
columns of
S
represent ‘factors loadings’ in a phylogenetic factor
analysis (Hassler et al., 2022; Thorson et al., 2017).
In general, covariance
𝚺
among
nj
traits involve
nj(
n
j
+1
)
2
moments, and the analyst can specify anywhere from one to
nj(
n
j
+1
)
2
para meters within the two matrices
𝚪
and
S
. To simplif y
the user- interface, we require the user to specify linkages as a text
file following the format of r- p a c k a g e sem (Fox et al., 2020), and then
parse this text file to construct
𝚪
and
S
from a vector of estimated
parameters.
 | 
categorical variables
We next outline how this model is fitted to a set of
nc
continuous
and
nd
categorical traits, for a tot al of
nt=nc+nd
traits. This has
bee n do ne prev iousl y using a ‘thre shold model ’ to comb ine categori-
cal and continuous trait s (e.g., Cybis et al., 2015; Felsenstein, 2012;
Tolkoff et al., 2018), although we instead fit categorical traits using
a Categorical distribution based on estimated probabilities for each
categorical level (similar to Hadfield & Nakagawa, 2010). These traits
are assembled in a matrix
Y
with dimension
ni×nt
, where missing
values are recorded as NAs and are excluded when computing the
likelihood across available data. We also record the number of levels
mt
for each trait
t
, where categorical traits have
mt
2
by defini-
tion and we adopt the convention that
mt=1
for continuous traits.
Categorical traits are modelled via a probability vector that is con-
strained to sum to 1, so it requires
mt1
variables to describe a
categorical trait with
mt
levels. For trait- matrix
Y
with
mt
levels for
each trait
t
, we therefore must estimate latent trait matrix
B
with
nj
=n
c
+
n
t
t=1
m
t
1
columns and
ng
rows (where
ng
is the total
number of taxa in the tree). We also define a vector
h
with length
nj
, where
hj
{
1, 2, ,n
t}
; this vector associates each column of
B
with a corresponding column of
Y
. If trait
t
is continuous then only
one value of
hj=t
. Alternatively, if trait
t
is categorical then
hj=t
for
mt1
elements. Finally, we associate
ni
rows of
Y
with
ng
rows of
B
by defining a vector
g
with length
ni
where
gi
provides the t axon as-
sociated with sample
i
. The process of fitting latent trait s
B
to trait
measurements
Y
differs somewhat between continuous and cat-
egorical traits, as we explain next.
For a continuous trait
t
, we extract column
yt
from
Y
. We also
extract the column from
B
for which
hj=t
and cal l this submatri x
B(t)
 .
We then specify a normal distribution for residual (measurement)
variation:
where
𝜎2
j
is the magn itude of measu rement er rors and is estima ted as a
fixed ef fect, although we fix
𝜎j=0.01
(i.e., forcing
𝛽
g
i
,
j
to approach
yi,j
 )
for any trait
j
that does not have replicated measurements and hence
cannot estimate
𝜎2
j
.
For a categorical trait
t
, we again extract column
yt
from
Y
.
However, we then expand
yt
to an indicator matrix
Z(t)
with dimen-
sion
ni×mt
, such that a trait with
mt
possible levels is converted to a
matrix with
mt
columns where each row
i
contains a 1 in the column
corresponding to level
yi,t
and zeros other wise. We also extract the
mt1
columns from
B
for which
hj=t
and again call this submatrix
B(t)
. We calculate the probability
𝛑(t)
g
for each level
k
{
1, 2, ,m
t}
of categorical trait
t
via a mul ti variate log is tic tra ns for ma ti on of each
row
𝛃(t)
g
of
B(t)
:
(2)
x=𝚪x+𝛆,
𝛆
MVN
(
0, SSt
).
(3)
L=(I𝚪)1S,
(4)
y
i,tNormal
(
𝛽(t)
g
i
,1,𝜎2
j
),
(5)
𝜋
(t)
g,k=
e𝛽
(t)
g,k
1+mt1
k=1e𝛽(t)
g,k
if kmt1
1
1+
mt1
k
=1
e𝛽(t)
g,k
if k=mt
.

|

Methods in Ecology and Evoluon
THORSON et al.
This multivariate logistic transformation converts
mt1
unbounded
values in
𝛃(t)
g
to
mt
probabilities
0
<𝛑
g
<
1
where
m
t
k=1
𝜋
g,k
=
1
by con-
struction. We then fit a categorical distribution:
This differs from previous specifications of a ‘threshold model’ to
predict categorical traits, which have typically predicted a response
of
zi,k=1
whenever a ‘liability’ variable
𝛽g(i),j
exceeds an estimated
threshold and zero otherwise (e.g., Felsenstein, 2012). Such a thresh-
old model must integrate across values of
𝛽g(i),j
that fall on the right
side of a given threshold for a measurement
yi,j
, typically accomplished
using Bayesian hierarchical models and MCMC sampling. By contrast,
we specify that latent traits
𝛃(t)
g
for each taxon
g
are trans forme d to the
probability
𝛑(t)
g
for each level of a categorical variable.
Parameters of this model remain identifiable given missing data
(i.e. entries of
yi,t=NA
). In these cases, the model continues to in-
tegrate across latent variables
B
, and simply does not include these
missing values of
Y
in the likelihood. We note that we assume trait
measurements
Y
are missing at random. If the probability of having
an available trait measurement (termed ‘sampling intensity’) is cor-
related with latent traits
𝛃j
, then this assumption will result in ‘pref-
erential sampling’ bias (Diggle et al., 2010 ). We recommend further
research regarding model- based mitigation of this bias (e.g., Conn
et al., 2017), but do not explore the topic further here.
 | 
We identify maximum likelihood estimates for all model parameters
(see Supporting Information B for estimation details). This requires
calculating an objective function as the product of the likelihood
(Equation 4/6) and the probability of random effects (Equation 1).
We obtain the marginal likelihood by integrating the objective func-
tion across random effects
B
, composed of random effects
𝛽g,j
for
all taxa
g
(including tips and ancestors) and traits
j
. This multivari-
ate integral is approximated using the Laplace approximation and
implemented using r- p a c k a g e tmb, and this is computationally effi-
cient because the inverse- covariance
(R𝚺)1
has a value of 0 for
any two taxa that are not adjacent in the specified tree (Kristensen
et al., 2016). We then maximize the marginal likelihood with respect
to remaining fixed effects (
𝚪
,
S
,
𝛃0
, and
𝛔2)
, export the estimate of
SEM- coefficients
𝚪
and
S
, extract empirical Bayes’ predictions for
latent traits
B
(which includes imputed values for missing trait val-
ues), and use r- p a c k a g e sem to visualize the estimated path diagram.
Path coefficients
𝚪
can be interpreted as a regression slope, but
the precise interpretation depends upon the transformation that
was chosen by the analyst for connected variables
Y1
Y2
 . For
example, if
Y1
is untransformed (e.g. temperature in Celsius) and
Y2
is log- transformed (e.g., log- maximum body length), then e.g.,
𝛾1,2 =0.1
indicates that a 1 Celsius increase in
Y1
is associated on av-
erage with a 10% increase in
Y2
. By contrast, if
Y1
is log- transformed
(e.g. log- maximum body length), and
Y2
and
Y3
are two levels of a
categorical variable, then
𝛾1,2 =0.1
and
Y1,3 =−0.1
indicates that
a 10% increase in maximum body length is associated on average
with a
e0.1(0.1)e0.1(0.1)=2%
increase in the odds of level
Y2
relative
to level
Y3
. We also note that the covariance among traits
𝚺
is es-
timated as being constant across the entire phylogenetic tree (i.e.,
that
Var(B)=R𝚺
). In reality, slope and variance parameters may
be nonstationary, representing different evolutionary trade- offs and
rates resulting from environmental context and ecological traits that
are not being modelled. We recommend further research extending
the approach to include nonstationarity, and interpret parameters
in this study as representing a sample- weighted average across the
tree being analysed.
 | 
traits of fishes
To test and apply these methodological advances, we seek to es-
timate life- history traits for all described fishes (Chondrichthyes
and Osteichthyes) included in FishBase in November 2019, where
previous research has validated that these data are likely unbi-
ased (Thorson et al., 2014 ). There is no phylogeny available for all
fishes, despite ultrametric phylogenies existing separately for a
subset of bony (Rabosk y et al., 2018) and cartilaginous fishes (Stein
et al., 2018). We therefore follow past research (Johnson et al., 2021;
Thorson et al., 2017) in approximating phylogeny via taxonomy, that
is, where all taxonomic classes are assumed to have a single common
ancestor, and then including ancestral levels for order, family and
genus. Package FishliFe then converts taxonomy to a tree using r-
package ape (Paradis & Schliep, 2019), and when using taxonomy we
specify phylogenetic distance
dg=1
for each level of the taxonomic
tree (i.e., for family to genus, genus to species, etc). We later pro-
vide a sensitivity analysis with a novel merged phylogeny.
We analyse 17 continuous- valued traits and four c ategori-
cal traits, where the latter include 16 levels in total. These trait
data include at least one measurement for 26,622 fish species.
However, life- histor y data in particular are missing for many species
(Figure B1), where 2%– 27% of species have at least one measure-
ment of a given trait related to growth, mortalit y, or body size. These
‘inclusion rates’ are higher for genera (7%– 24%), and family levels
(26%– 76%), suggesting that phylogenetic information is necessar y
to infer trait- values for many species based on their genus or family.
We classify these 33 variables into six trait categories, expand-
ing upon the list from Hadj- Hammou et al. (2021) where traits are
broadly classified into five categories: (1) behaviour, (2) life history,
(3) morphology, (4) diet and (5) physiology. The list includes at least
one variable in each category (see Table 1 for details). The morpho-
metric traits are composed of continuous measures of body shape
traits that describe overall body shape for 5940 extant species of
actinopterygian fishes spanning 392 families, taken on specimens at
the Smithsonian Museum of Natural Histor y and averaged by spe-
cies (Price et al., 2019, 2022). These data include eight linear mea-
surement s in three dimensions: standard body and jaw length; head,
(6)
z
(t)
i
Categorical
(
𝛑(t)
g(i)
).

|

Methods in Ecology and Evoluon
THORSON et al.
body, and caudal peduncle depth; and body, jaw, and caudal pedun-
cle width. We standardized specimen morphometrics to account for
variation in individual development for museum specimens, by divid-
ing each measurement by the geometric mean of specimen length,
width, and height.
We use several design principles to assemble the SEM for fishes,
and this in turn defines the structure of SEM coef ficients
𝚪
and ex-
ogenous covariance
V
. Specifically we specify that:
1. temperature (in Celsius) is the exogenous ‘root of the path
diagram. This recognizes that life- history studies typically
use temperature as a covariate to predict size and mortality
(Gislason et al., 2010; Palomares et al., 2022; Pauly, 1980 ),
and hence our estimates are comparable to widely reported
slopes.
2. von Bertalanf fy length (
L
) in unit s mm has the gr e ate s t nu m ber of
impacts on other traits, in recognition of the central role of body
size in size- structured evolutionary theory (Andersen, 2019). Von
Bertalanffy length is the asymptotic body size of a fish. We in-
clud e linkages to other measurements of size (in mm or g), grow th
(in
year1)
, and mortality parameters (in units
year1
), as well as
to categorical traits representing reproductive behaviour, feed-
ing mode, and habitat (Denéchère et al., 2022; Palomares et
al., 2022).
 Trait category
Continuous
(C) or
categorical (F)
Transformation
(if continuous)
Levels (if

age_max L i f e - h i s t o r y C Natural log
trophic_level Diet CIdentity
aspect_ratio Morphology C Natural log
fecundity Reproduction C Natural log
growth_
coefficient
Physiology C Natural log
temperature Physiology CIdentity
length_max Physiology C Natural log
length_infinity Physiology C Natural log
length_maturity Physiology C Natural log
age_maturity Physiology C Natural log
natural_mortality Physiology C Natural log
weight_infinity Physiology C Natural log
max_body_depth Morphology C Natural log
max_body_width Morphology C Natural log
lower_jaw_length Morphology C Natural log
min_caudal_
peduncule_
depth
Morphology C Natural log
offspring_size Reproduction C Natural log
spawning_type Reproduction F nonguarders
guarders
bearers
habitat Behaviour F demersal
benthopelagic
reef- associated
bathymetric
pelagic
feeding_mode Diet F macrofauna
planktivorous_or_
other
generalist
body_shape Morphology F elongated
fusiform_normal
short_and_or_deep
eel- like
other
TAB LE 1  Life- history traits included
in the analysis, listing the variable name,
trait category (using five defined by
Hadj- Hammou et al. (2021) while also
adding ‘Reproductive’ as a sixth category),
whether the trait is continuous or
categorical, the transformation applied
to continuous variables achieve a close-
to- normally distributed process for
evolutionary changes, and the levels for
factor- valued traits.

|

Methods in Ecology and Evoluon
THORSON et al.
3. both growth and mortality rates affect age and length at maturity,
in recognition that their ratio affects the optimal maturation tim-
ing (Holt, 1958).
4. for each categorical trait
t
(i.e., all columns
j
of
B
where
hj=t
), the
exogenous covariance
V
is symmetric and positive definite but
otherwise uncons trained (i.e., the body- shape trait has five meas-
ured levels and involves estimating
4×5
2
=
10
covariance param-
eters in
S
), while continuous traits have independent exogenous
variance (i.e.,
S
is diagonal for these rows and columns).
Future research could compare fit with alternative assumptions
about life- history trade- of fs (e.g., Mason et al., 2016).
2.4.1  |  Sensitivity, validation and performance
We assess the performance of the model, validate result s, and explore
sensitivity to alternative assumptions using several auxiliary analyses.
First, we compare phylogenetic structural equation modelling
with the r- p a c k a g e phylolm (Tun g Ho & An é, 2014) as wid ely use d ex-
ample of standard phylogenetic comparative methods (Supporting
Information D). We specifically compare model structure, and also
using a short simulation experiment with 500 replicates to confirm
that the phylogenetic SEM can generate identical estimates of re-
gression coefficient s to an existing phylogenetic linear model pack-
age. For each replicate, we simulate an additive tree with 100 ‘tips’
and randomized branch lengths and structure. We then simulate two
variables under a Brownian motion model from this tree, exploring
scenarios either with complete data for each taxon, or 60% of taxa
missing measurement s for each trait. We record the estimated slope
parameter for these two models.
We also assess sensitivity of results to using taxonomic infor-
mation as a proxy for evolutionary relatedness. To do so, we first
merge publicly available chondrichthyan (Stein et al., 2018) and ac-
tinopter ygian (Rabosky et al., 2018) ultrametric trees, using branch
leng ths to inf er the loca ti on of their commo n an ce stor. We th en sub -
set our data to the 11,070 species that can be matched between
trait dat a and the merged phylogeny, and repeat the analysis on this
subset. Subsetting to these matched species reduces the number of
available trait measurements from 246,736 to 152,596, so we pres-
ent these estimates using phylogenetic information as a sensitivity
analysis.
Next, we validate the predictive performance of the model by
conducting a 4- fold cross- validation experiment. To do so, we ran-
domly partition each row of original data matrix
Y
into one of four
bins (labelled
{A,B,C,D}
). For the first experiment, we then fit the
model to all data in bins
{B,C,D}
and use the estimated parameters
to predict
𝛃t
for continuous traits and level probabilities
𝛑(t)
g
for cat-
egorical traits corresponding to data in bin
A
. We record these and
then repeat this process for the other three bins, comparing pre-
dictions with the withheld data. This experiment evaluates perfor-
mance when predicting new data that are collected via the same
process as the original data set (Roberts et al., 2017 ), and we rec-
ommend future research use a blocked cross- validation design to
explore performance when predicting traits for taxa that are sys-
tematically under- represented in available data.
We then evaluate performance separately for continuous and
categorical traits:
Continuous traits: for continuous trait
t
, we plot unfitted obser-
vations
yt
against the out- of- bag predictions
𝛃(t)
j
(where
𝛃(t)
j
is the
column of
B
for which
hj=t
), and also calculate the percent vari-
ance explained relative to a null model that predicts
yt
based on
its mean value
yt
:
PVEv
predicts the proportion of variance that would be explained
for a hypothetic al ‘new’ sample, where a value of 0 indicates no
out- of- bag explanatory power (i.e. no improvement relative to
predicting new samples as the mean of all data) and a value of 1
implies perfect explanatory power.
Categorical traits: for latent trait
𝛃(t)
j
rep re se nt in g a leve l of a ca t-
egorical trait, remember that we expand original data
yt
to an
indicator matrix
Z(t)
where
z(t)
k
is the column corresponding to
level
k
of latent trait
t
. This indicator column has value 0 when
a taxon does not have that level and 1 when it does, while the
model estimates the probability
𝛑(t)
k
for that level of the cate-
gorical trait, and these probabilities sum to one across levels
k
{
1, 2, ,m
t)
. To evaluate model performance, we plot the
receiver operator characteristics (ROC) cur ve for each level,
which involves calculating the rate of false- positives and false-
negatives when converting the predicted probability to a pre-
dicted indicator using different potential threshold values. We
then calculate the area under the ROC (AUC) using R package
pROC (Robin et al., 2011), where an AUC of 0.5 indicates no
out- of- bag ability to discriminate between 0 and 1 values for an
indica to r, and an AUC of 1 imp li es pe rfe ct ab il ity to dis cr iminate
between these.
2.4.2  |  Identifying life- history strategies
We illustrate results by identifying a small number of life- history
strategies for fishes, defined as an extreme combination of trait
values that frequently occur together, such that all fishes can be
characterized as some mixture of strategies (i.e., following the usage
in Winemiller & Rose, 199 2). Previous studies have applied cluster-
ing methods to a smaller subset of species than we have available,
e.g., for North American fishes (Winemiller & Rose, 199 2), selected
North Pacific marine fishes (King & McFarlane, 2003), freshwater
fishes (Mims et al., 2 010), or European marine fishes (Pecuchet
et al., 2017). However, our study is the first to predict the life- history
strategies for all described fishes worldwide, representing more
than 34,000 species.
(7)
PVE
t=1ni
i=1yi,t𝛽(t)
g(i),j
2
ni
i=1
y
i,t
y
t
2
.

|

Methods in Ecology and Evoluon
THORSON et al.
We specifically follow Winemiller and Rose (1992) in estimat-
ing ‘archetypes’ that represent an extreme combination of life-
history characteristics. All fish taxa are then described as a finite
mixture of these archetypes, and we refer to these archetypes as
‘life- histor y strategies’. This contrasts with other studies that have
clustered taxa continuously within the space of life- histor y traits
(King & McFarlane, 2003). To do so, we extract predictions
𝛃(t)
for
continuous traits and level probabilities
𝛑(t)
g
for categorical traits for
all taxa that have at least one observation (i.e., are not purely drawn
from the predictive distribution based on its taxa). We then apply
‘archetypal analysis’ (Cutler & Breiman, 1994) following methods
from Pecuchet et al. (2017), using r package archetypes (Eugster
& Leisch, 2009). Archetypal analysis involves estimating
nb
‘arche-
types’
𝛂b
composed of values
𝛼b,j
, representing the value of variable
j
in archetype
b
. Each taxon
𝛃g
is then predicted as a finite mixture
of these archetypes, with mixture coefficient s
pg,b
defined such that
n
k
k=1
p
g,b
=
1
and
pg,b
>
0
. Archetypal analysis then estimates the
value of
𝛼g,b
and
pb,j
to minimize the sum of squared distance (SSD)
between predicted and inputted
B
. We use a scree- plot to visualize
how the SSD decreases when using 1– 6 archetypes and we select
the number by visually identifying when further increases generate
little improvement in SSD. We then explore the result s in two ways:
Archetype trait values: We extr act trait va lue s fo r est im ated ar ch e-
types,
𝛂b
, to interpret which traits are associated with each. We
specifically convert
𝛼b,j
to a percent score
𝛼
b,j
by calculating the
propor tion of fishes having a predicted trait
𝛃j<𝛼
b,j
.
Simplex by taxonomy: Similarly, we extract mixture coefficients
pg
for each taxon. We then use package archetypes to apply a
skew- orthogonal transformation to visualize
pg,b
in a two dimen-
sional simplex (Seth & Eugster, 2014). We specifically compare
pg,b
for major taxa and compare resulting assignments with previous
studies (Winemiller & Rose, 1992).
| 
The phylogenetic structural equation model quantified the direct
impact of temperature on size and growth. Specifically, a one de-
gree Celsius increase was associated with a 4% increase in growth
coefficient (with standard error SE = 0.3%), 2% increase (SE = 0.2%)
in mortality rate, and 2% decrease (SE = 0.3%) in asymptotic body
length (Figure 2; Table E1), where these represent average asso-
ciations across the wide range of fishes being analysed. In turn, a
10% increase in asymptotic body length was associated with an
8.2% (SE = 0.3%) decrease in natural mortality and a 6.6% decrease
(SE = 0.2%) in growth coefficient. When both direct and indirect ef-
fe c t s are inc l uded (Table E2), temperature had a slightly larger impact
on the grow th coef ficient (0.051) than on the mor tality rate (0.035).
Temperature was estimated to have a minimal effec t on reproduc-
tive behaviour, feeding mode, or spawning type, while asymptotic
length had a larger effect on these traits (Table E2). For example, a
10% increase in asymptotic length was estimated to decrease the
odds of guarding behaviour relative to non- guarding behaviour by
34% (Table E2). Finally, the model also captured previously docu-
mented life- history trade- offs, including the association between
earlier maturation and higher relative mortality (Figure E1).
The simulation experiment confirmed that FishLife and the
widely used r- p a c k a g e phylolm give essentially identical estimates
when fitting continuous trait s and data are available for all species
(Figure D1, left panel), and that FishLife shows a small improve-
ment in estimation performance when data are missing at random
(Figure D1, right panel). Four- fold cross- validation confirmed that
the model fitted to real- world data had good performance when
predicting records that were randomly dropped from the model
fitting (Figure 3). Continuous- valued traits had a percent- variance
explained (PVE) ranging from 51% to 89%. Among these variables,
performance was particularly high (>80% PVE) for traits measuring
length, weight, and fecundity, but lower for traits measuring age,
growth, maturity, and trophic level. Similarly, levels of categorical
traits had an area under the receiver- operator- characteristics curve
(ROC) ranging from 0.86 to 0.99, with lower (but still high) power to
discriminate levels for the feeding- mode trait. Comparing the model
fitted using taxonomy with one using a subset of data and phylog-
eny to represent evolutionary distance (Figure E2) shows similar
estimates of linkages for life- history parameters (i.e., for mortality,
growth, size, and maturity parameters) between analyses. However,
the estimated impact of body size on body shape was substantially
larger when using phylogeny.
Our approach is further demonstrated by the archetype analysis,
which identified three life- history strategies (Figure E2), in agree-
ment with Winemiller and Rose (199 2). The first archetype (purple
in Figure 4 and top panel in Figure 5) was associated with higher
maximum age, trophic level, slow growth, and low temperatures.
This suite of traits corresponded to the ‘equilibrium’ strategy from
Winemiller and Rose (1992). The third archetype (yellow in Figure 4
and bottom panel in Figure 5) corresponded to the opportunistic
strateg y from Winemiller and Rose (1992). It had the lowest max-
imum age and fecundity, while having high natural mortalit y and
probability of guarding their young. Finally, the second archetype
was somewhat intermediate in terms of growth and size, while typ-
ically having highest fecundity, being mainly pelagic and having the
highest probabilit y of a non- guarding reproductive strategy. As ex-
pected, there was strong phylogenetic signal in these life- history
strategies, with Elasmobranchii (sharks and rays) representing the
equilibrium strategy, Clupeidae (herrings and sardines) largely rep-
resenting the periodic strategy, and Gobiidae (gobies) largely repre-
senting the oppor tunistic strategy (Figure 6).
| 
We extended phylogenetic trait- imputation methods to include two
additional features: (1) representing the covariance among traits via
a structural equation model, and (2) incorporating both continuous
and categorical traits. We fit categorical traits using latent variables

|

Methods in Ecology and Evoluon
THORSON et al.
that are then transformed to calculate the probability for each cat-
egorical level. Unlike past analyses (e.g. Felsenstein, 2012), how-
ever, we use a computational method (the Laplace approximation)
that allows rapid inference on large trees. These two developments
have wide relevance for applications across life- history databases
for any taxonomic group within various ecosystems worldwide
such as plants, mammals, fishes, birds, insects, as well as compar-
ing across these taxa (Capdevila et al., 2020). For example, Kat tge
et al. (2011) documented 52 traits for 69,000 plant species in the
TRY global plant database, of which 15 are categorical including
Mycorrhiza type, nitrogen fixation capacity, and pollination mode.
In addition, GRooT (Guerrero- Ramírez et al., 2021) includes 38 root
traits, from 38,276 species- by- site mean values based on 114,222
trait records, for more than 100 0 species, such as root mass frac-
tion, root carbon and nitrogen concentration, lateral spread, root
mycorrhizal colonization intensity, mean root diameter, root tissue
density, specific root length and maximum rooting depth. Similarly,
the bird trait database AvoNET (Tobias et al., 2022) includes con-
tinuous morphological traits but also categorical traits like trophic
level (three levels), foraging niche (nine levels) and foraging locomo-
tory behaviour (five levels) for 11,009 species. Likewise, the for-
aging database EltonTraits (Wilman et al., 2014) includes foraging
time as a categorical trait for 9993 bird and 5400 mammal species.
Clearly there is potential for both phylogenetic signal within these
 Path diagram representing specified causal linkages and estimated
𝚪
coefficients (see Figure 1 for description) linking fish
traits when using taxonomy to represent evolutionary distance, using package sem to generate the plot (Fox et al., 2020), where levels of
the categorical variables are abbreviated (H: habitat; FM: feeding mode; BS: body shape; ST: spawning type) and coefficients for categorical
variables represent the log- odds relative to a specified base level (H: demersal; FM: generalist; BS: fusiform/normal; ST: nonguarders). Note
that evolutionary variance and covariance parameters
𝚺
are not shown here for clarity of presentation.
log(length_infinity)
log(growth_coefficient)
-0.66
log(natural_mortality)
-0.82
log(weight_infinity)
2.96
log(length_max)
1
log(aspect_ratio)
0.06
ST: guarders
-3.48
ST: bearers
-3.62
H: pelagic
0.49
H: benthopelagic
0.08
H: reefassociated
-0.1
H: bathymetric
0.58
FM: planktivorous_or_other
-1.76
FM: macrofauna
0.84
BS: elongated
-1.07
BS: other
-3.09
BS: short_and_or_deep
-0.4
BS: eellike
0.37
log(length_maturity)
0.28
log(age_maturity)
-0.5
-1.28
-0.3
log(age_max)
-0.85
trophic_level
0.1
log(fecundity)
0.71
log(offspring_size)
0.06
log(max_body_width)
-0.13
log(max_body_depth)
0.22
log(lower_jaw_length)
-0.11
log(min_caudal_pedoncule_depth)
0.17
temperature
-0.02
0.04
0.02
0.01

|

Methods in Ecology and Evoluon
THORSON et al.
 Evaluating predictive performance for all variables based on a four- fold cross- validation experiment. For continuous- valued
traits, plots show the held- out value (x- axis) against the predicted value (y- axis), along with the one- to- one line (black line) and list the
percent- variance- explained (PVE). A well- performing model will have predictions near the one- to- one line and a PVE approaching 100%.
For discrete- valued traits, we used the held- out factor- level indicator (0 or 1) and the predicted class probability to calculate the receiver-
operator charac teristics cur ve (ROC). A well- performing model will have ROC in the upper- lef t corner and an AUC approaching 1.0.

|

Methods in Ecology and Evoluon
THORSON et al.
 Frequency distribution (y- axis) for estimated values ( x- axis) for each life- histor y trait (panels) with the trait- value for
each of three life- history strategies identified using the ‘archetype’ analysis (vertical lines; purple: Equilibrium; green: Periodic; yellow:
Opportunistic).
log(age_max)
−1 012345
trophic_level
1.52.0 2.53.0 3.54.0 4.55.0
log(aspect_ratio)
−3 −2 −1 012
log(fecundity)
0510 15
log(growth_coefficient)
−4 −3 −2 −1 012
temperature
0510 15 20 25 30
log(length_max)
0246
log(length_infinity)
0246
log(length_maturity)
0123456
log(age_maturity)
−2 −1 0123
log(natural_mortality)
−3 −2 −1 012
log(weight_infinity)
0510 15
log(max_body_depth)
−1.5 −1.0 −0.50.0 0.5
log(max_body_width)
−2.0 −1.5 −1.0 −0.50.0 0.5
log(lower_jaw_length)
−3 −2 −1
01
log(min_caudal_pedoncule_depth)
−5 −4 −3 −2 −1
log(offspring_size)
−6 −4 −2 02468
ST: nonguarders
0.00.2 0.40.6 0.
81
.0
ST: guarders
0.00.2 0.40.6 0.81.0
ST: bearers
0.00.2 0.40.6 0.81.0
H: demersal
0.00.2 0.40.6 0.
81
.0
H: bathymetric
0.00.2 0.40.6 0.81.0
H: benthopelagic
0.00.2 0.40.6 0.81.0
H: reef−associated
0.00.2 0.40.6 0.
81
.0
H: pelagic
0.00.2 0.40.6 0.81.0
FM: generalist
0.00.2 0.40.6 0.8
FM: macrofauna
0.00.2 0.40.6 0.
81
.0
FM: planktivorous_or_other
0.00.2 0.40.6 0.81.0
BS: fusiform / normal
0.00.2 0.40.6 0.81.0
BS: elongated
0.00.2 0.40.6 0.
81
.0
BS: short and / or deep
0.00.2 0.40.6 0.81.0
BS: eel−like
0.00.2 0.40.6 0.81.0
BS: other
0.00.2 0.40.6 0.81.0

|

Methods in Ecology and Evoluon
THORSON et al.
categorical traits, and ecologically meaningful relationships between
qualitative and quantitative traits. This is clearly illustrated in our
case study, which estimates that fishes with smaller adult body sizes
(a continuous trait) are more likely to guard their young (a categori-
cal trait) in agreement with recent theoretical prediction (Denéchère
et al., 2022). Similarly, categorical traits (e.g. reproductive behaviour
in fishes, or nitrogen fixation in plants) can be highly relevant when
measuring functional diversity or predicting responses to new com-
petitors or climates. Overall, this underlines the importance of in-
cluding categorical traits when imputing traits for both regional and
macroecological studies.
We also argue that SEM will be increasingly attractive for phy-
logenetic trait imputation as the number of trait s increases. This
utility arises because phylogenetic trait imputation with
nj
traits
typically requires on the order of
n2
j
parameters for the covari-
ance among traits (Bruggeman et al., 2009), or
njnf
parameters
when specifying
nf
fac tors th at rep re se nt major axes of covariance
among traits (Hassler et al., 2022; Thorson et al., 2017). These ap-
proaches scale rapidly with an increase in the number of traits,
which becomes prohibitive when there are many traits to consider,
such as in the TRY database version- 5 containing 2100 traits. By
contrast, SEM allows customized specification of the number of
parameters, ranging from
1
(i.e., identical evolution rate for each
trait) to
nj(
n
j
+1
)
2
. Furthermore, path parameters in
𝚪
are in-
terpretable as regression slopes, such that individual parameters
can be compared with pre- existing theory about trait linkages,
whether from field observations or laboratory experiments. In
our study for example, we estimate a nearly isometric (2.96) scal-
ing of asymptotic body leng th and body mass and a linear scaling
of asymptotic and maximum length (0.99), and these parame-
ters are easily corroborated when evaluating model plausibility.
Indeed, future SEMs could consider fixing these and other param-
eters a priori to improve parsimony and the resulting precision
for difficult- to- estimate trait linkages. Alternatively, we estimate
the total (direct and indirect) impact of log(length) on log(natural
mortality) of −0.82, and this differs somewhat from the inverse
relationship claimed by Lorenzen et al. (2022), such that in some
cases it is helpful to test for differences relative to existing the-
ory. Finally, SEM starts by specifying a graph (where nodes rep-
resent variables, and edges represent dependencies), which can
 Illustration of traits
associated with each estimated life-
history strategy (equilibrium, periodic and
opportunistic), specifically showing the
propor tion of species with a trait- value
lower than that of a given archetype (y-
axis) for each trait (x- axis) and archetype
(panel).
0.00.2 0.40.6 0.8
Equil.
0.00.2 0.40.6 0.8
Per.
log(age_max)
trophic_level
log(aspect_ratio)
log(fecundity)
log(growth_coefficient)
temperature
log(length_max)
log(length_infinity)
log(length_maturity)
log(age_maturity)
log(natural_mortality)
log(weight_infinity)
log(max_body_depth)
log(max_body_width)
log(lower_jaw_length)
log(min_caudal_pedoncule_depth)
log(offspring_size)
ST: nonguarders
ST: guarders
ST: bearers
H: demersal
H: bathymetric
H: benthopelagic
H: reef−associated
H: pelagic
FM: generalist
FM: macrofauna
FM: planktivorous_or_other
BS: fusiform / normal
BS: elongated
BS: short and / or deep
BS: eel−like
BS: other
0.00.2 0.40.6 0.8
Oppor.
Proportion of fishes with lower trait−value than this archetype

|

Methods in Ecology and Evoluon
THORSON et al.
be readily derived from existing conceptual or theoretical models
for a given taxonomic group. For example, leng th- structured mod-
els for fish evolution have already derived boldness as a function
of exogenous changes in mortality rate (Andersen et al., 2018) or