PreprintPDF Available

Bayesian predictive modeling of multi-source multi-way data

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We develop a Bayesian approach to predict a continuous or binary outcome from data that are collected from multiple sources with a multi-way (i.e.. multidimensional tensor) structure. As a motivating example we consider molecular data from multiple 'omics sources, each measured over multiple developmental time points, as predictors of early-life iron deficiency (ID) in a rhesus monkey model. We use a linear model with a low-rank structure on the coefficients to capture multi-way dependence and model the variance of the coefficients separately across each source to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs sampling algorithm for posterior inference, assuming a continuous outcome with normal errors or a binary outcome with a probit link. Simulations demonstrate that our model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients, with large gains in performance by incorporating multi-way structure and modest gains when accounting for differing signal sizes across the different sources. Moreover, it provides robust classification of ID monkeys for our motivating application. Software in the form of R code is available at https://github.com/BiostatsKim/BayesMSMW .
Content may be subject to copyright.
Bayesian predictive modeling of multi-source
multi-way data
Jonathan Kim1, Brian J. Sandri2,3, Raghavendra B. Rao2,3, Eric F. Lock1
1Division of Biostatistics, School of Public Health
2Division of Neonatology, Department of Pediatrics
3Masonic Institute for the Developing Brain University of Minnesota
August 9, 2022
Abstract
We develop a Bayesian approach to predict a continuous or binary outcome from
data that are collected from multiple sources with a multi-way (i.e.. multidimen-
sional tensor) structure. As a motivating example we consider molecular data from
multiple ’omics sources, each measured over multiple developmental time points, as
predictors of early-life iron deficiency (ID) in a rhesus monkey model. We use a
linear model with a low-rank structure on the coefficients to capture multi-way de-
pendence and model the variance of the coefficients separately across each source
to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs
sampling algorithm for posterior inference, assuming a continuous outcome with nor-
mal errors or a binary outcome with a probit link. Simulations demonstrate that
our model performs as expected in terms of misclassification rates and correlation of
estimated coefficients with true coefficients, with large gains in performance by incor-
porating multi-way structure and modest gains when accounting for differing signal
sizes across the different sources. Moreover, it provides robust classification of ID
monkeys for our motivating application. Software in the form of R code is available
at https://github.com/BiostatsKim/BayesMSMW.
1 Introduction
Technological advancements in biomedical research are producing datasets that are very
large and have complex structures. Some data are represented as a multi-way array, also
called a tensor, which extends the two-way data matrix to higher dimensions. Some data
are multi-source, which involves features from different sources of data matched by samples
(this is also known as multi-view data). A growing number of datasets are simultaneously
multi-source and multi-way (MSMW). As a motivating example of MSMW data, we con-
sider predictors of early-life iron deficiency (ID) in infant monkeys using data described in
1
arXiv:2208.03396v1 [stat.ME] 5 Aug 2022
Sandri and others (2022). In this naturalistic ID model in infant rhesus monkeys, 2030%
of infants develop ID and anemia between 4 and 6 months of age due to a combination of
lower iron stores at birth and rapid postnatal growth rate (Lubach and Coe, 2006; Coe and
others, 2013). Prior studies in this model have shown that the ID infants have metabolomic
and proteomic abnormalities in the serum and cerebrospinal fluid in the preanemic and ane-
mic periods with residual changes persisting even after the resolution of anemia with iron
treatment (Geguchadze and others, 2008; Coe and others , 2009; Patton and others, 2012;
Rao and others, 2013, 2018; Sandri and others, 2020, 2021, 2022). Data were available from
two sources, serum proteomics and serum metabolomics, collected at two time points, 4 and
6 months after birth. The data therefore form two 3-way arrays: [monkeys ×proteomics
×time] and [monkeys ×metabolomics ×time]. This motivating data is therefore MSMW
and we are interested in identifying signals in the biomarkers that can predict ID status.
To understand the significance of incorporating MSMW structure into analysis, consider
a naive approach in which each source’s multi-dimensional data array is transformed into
a vector and features from all sources are concatenated into a single vector. While this
approach would produce data that could be analyzed using one of the many methods
available for vector-valued data, it would also have a number of shortcomings. Ignoring the
multi-way structure would not allow for consideration of dependence across dimensions.
Ignoring the multi-source structure would mean that any signal present in features from
smaller sources could be overrun by noise from larger sources with comparatively less signal.
A common aspect of MSMW data is the presence of far more features than samples,
often referred to as high-dimension low-sample size (HDLSS) data. While MSMW data
need not necessarily be HDLSS, it is sufficiently common that methods for handling MSMW
data will ideally allow for HDLSS structure. A Bayesian framework provides more flexibility
for model-based supervised analysis of high-dimensional data, as appropriate regularization
can be induced through the specified prior distribution.
In what follows we briefly review existing methods for predictive modeling of data that
are multi-source (Section 1.1) and multi-way (Section 1.2); our methodological contribu-
tions are summarized in Section 1.3.
1.1 Multi-source data
The issue of integrating data from multiple sources has been addressed in a variety of ways
for different tasks. For predicting an outcome from multi-source data, several approaches
extend unsupervised methods that were originally designed to integrate multi-source data
without prediction. Examples include various supervised extensions of canonical correlation
analysis (CCA). Rodosthenous and others (2020) extend CCA to an arbitrary number of
sources and an outcome by means of a generalized sparsity parameter. Joint association
and classification (JACA) (Zhang and Gaynanova, 2021) is a combination of CCA and
linear discriminant analysis for a binary outcome. Data integration analysis for biomarker
discovery using latent components (DIABLO) is an extension of both sparse projection
to latent structure discriminant analysis to multi-omics analyses and sparse generalized
CCA to a supervised analysis framework by Singh and others (2019), and a combination
of multivariate ANOVA with Bayesian CCA was developed by Huopaniemi and others
2
(2010). Supervised JIVE (sJIVE) (Palzer and others, 2022) was developed as a supervised
extension for prediction using the JIVE method, which decomposes data into latent factors
that are shared or specific to each source.
Other methods can be used directly in a supervised context (i.e., classification and re-
gression), without incorporating aspects of unsupervised analysis. One approach developed
by Van De Wiel and others (2016) handles “co-data”, which they defined as “all information
on the measured variables other than their numerical values for the given study”. In partic-
ular, their method involved partitioning variables into groups and imposing group-specific
penalties for ridge regression. This approach has some analogues to the multi-source prob-
lem in that it is able to perform prediction on a binary or continuous outcome using data
from multiple groups; though such “groups” of data included p-values from previous studies
or genomic annotations, in the multi-source context they may be defined by which source
the variable belongs to (e.g., proteins or metabolites). A Bayesian approach to multi-source
data that make use of the prior distribution to accommodate different sources has been
used recently by White and others (2021). However, their Bayesian Multi-Source Regres-
sion (BMSR) assumed double-matched multi-source data, i.e., the same features present
across all sources, and involved predicting different outcomes for each source instead of a
single outcome affected by multiple sources.
One limitation of all of these multi-source methods is that they do not have the ability
to accommodate data that exists in multiple ways, thereby limiting their physiological
importance to that trough of data potentially limiting critical findings.
1.2 Multi-way data
Methods developed for analyzing data with multi-way structure can be divided into unsu-
pervised and supervised categories, with the latter further divided between classification
and regression methods. Unsupervised approaches to handling multi-way data predomi-
nantly involve dimension reduction techniques, which reduce the number of features in the
data to a more manageable size while preserving the overall integrity of the data. Many
such methods of tensor decomposition are outlined in Kolda and Bader (2009). Gloaguen
and others (2022) developed a multi-way extension of regularized generalized canonical
correlation analysis that can accommodate data with a tensor structure from an arbitrary
number sources by incorporating Kronecker constraints into the optimization problem.
For supervised methods involving classification, there is growing literature on extend-
ing classifiers of vectors to multi-way arrays using factorization and dimension reduction
techniques. Tao and others (2005) proposed a supervised tensor learning framework that
generalized classifiers by performing a rank-1 decomposition on the coefficients to reduce
their dimension to a single set of weights for each dimension. Lyu and others (2017) pro-
posed a multi-way version of the classification method distance weighted discrimination
(DWD) under the assumption that the coefficient array is low-rank. Their implementation
of multi-way DWD was shown to dramatically improve performance over two-way classi-
fiers when the data have multi-way structure. However, their method is restricted to use
for three-way data. Guo and others (2022) proposed an extension of multi-way DWD that
also imposed a low-rank structure on the coefficient array, but allowed for data with an
3
arbitrary number of ways and accounted for sparsity.
Supervised methods of regression also build on dimension reduction techniques by ex-
tending them to the regression context. Both Zhou and others (2013) and Li and others
(2018) propose maximum likelihood estimation algorithms that could perform regression
with array-valued covariates through dimension reduction, with the former using CAN-
DECOMP/PARAFAC (CP) decomposition and the latter using Tucker decomposition. A
Bayesian formulation of tensor regression was developed by Miranda and others (2018),
which involves a multi-step process of partitioning tensor data into smaller sub-tensors,
reducing these sub-tensors via CP decomposition, and performing regression with sparsity-
inducing priors to identify informative sub-tensors. Another Bayesian approach to tensor
regression with a scalar response was developed by Guhaniyogi and others (2017) by means
of a novel multi-way shrinkage prior, which allows for simultaneous shrinkage of parameters
across all ways of data.
In the same way that existing multi-source methods have yet to be extended to ac-
commodate multi-way data, existing supervised multi-way methods are generally unable
to incorporate data from multiple sources.
1.3 Contributions: method for multi-source multi-way data
In this paper, we develop a Bayesian linear model that can perform regression or classi-
fication on MSMW data for either a continuous outcome with normal errors or a binary
outcome with a probit link, respectively. The central assumption for our multi-way ap-
proach is that the signal discriminating the ways can be efficiently represented by mean-
ingful patterns in each dimension, which we identify by imposing a low-rank structure on
the coefficient array. The central assumption for our multi-source approach is that the
signal discriminating the sources can be efficiently represented by modeling the variances
of the coefficients separately across each source to infer their relative contributions. We
incorporate both of these approaches into a single model under a Bayesian framework. We
also apply our method to a real-world MSMW dataset by predicting iron deficiency status
in infant monkeys based on multi-omic tissue samples.
2 Methods
2.1 Notation and framework
Throughout this article bold lowercase characters (a) denote vectors, bold uppercase char-
acters (A) denote matrices, and blackboard bold uppercase characters (A) denote multi-way
arrays of the specified dimension (e.g., A:P1×P2× ··· × PK). Square brackets index
entries within an array, e.g., Ap1, p2, . . . , pK. Superscripts in square brackets are used to
denote individual sources within a multi-source set of data, e.g., A[1],A[2],...,A[M]. Define
the generalized inner product for two arrays Aand Bof the same dimension is
A·B=
P1
X
p1=1 ···
PK
X
pK=1
A[p1, . . . , pK]B[p1, . . . , pK].
4
Define || · ||Fas the Frobenius norm and vec(A) as the vectorization of the entries in A.
For our context, X:N×[P[1],· ·· , P [M]]×Dgives data in the form of a 3-way array for N
subjects, where P[m]is the size of the mth source for m= 1, ..., M with P=PM
m=1 P[m], and
Dis an additional way for which we have data available for all subjects and sources. Each
subject has a response variable yi, which may be binary or continuous; let y= [y1, ..., yN].
Our goal is to predict the outcome ybased on the multiway covariates X.
2.2 Model
2.2.1 Bayesian linear model
We first briefly consider the special case in which we have only one way of data from only
one source, M= 1 and D= 1, and yis continuous. This is the classical setting with P
covariates available for Nsubjects, X:N×P. The basic linear model is y=XTb+e
where b= [b1, ..., bP] is the vector of covariate coefficients and e= [1, ..., N] is the vector
of error terms, which are assumed to have distribution i
iid
Normal(0, σ2).
Under a Bayesian framework, we place a prior distribution on band we assume all
bj, j = 1, ..., P are independent and identically distributed under this prior. If we let that
prior be a normal distribution with mean zero and variance τ, i.e. bjNormal(0, τ ),
we can also place a hyperprior on the variance τ. This has the advantage of empirically
controlling the level of shrinkage of the coefficient toward 0, via the posterior for τ. In
subsequent sections we extend this model to the multi-source and multi-way scenarios.
2.2.2 Multi-source model
Now, suppose our covariates come from M > 1 different sources, which can be conceived as
Mdatasets X1, ..., XMwith each Xmbeing an N×P[m]matrix such that PM
m=1 P[m]=P.
The vector of coefficients, b, can be represented as a concatenation of Mvectors each of
length P[m], i.e. b= [b[1], ..., b[M]] with each b[m]= [b[m]
1, ..., b[m]
P[m]] for m= 1, ..., M .
To infer the relative contribution of each source, we propose modeling the variances of
each source’s coefficients separately such that each source has its own independent prior
placed on its coefficients. That is, let b[m]Dmwhere Dmis an arbitrary distribution.
If we let the priors all be mean-zero normal distributions as in the previous section, that
gives us
b[m]MV N (0, τmI).(1)
This allows us to distinguish the level of the contribution for coefficients from different
sources by allowing for different source variances τm.
2.2.3 Multi-way model
Rank 1 model
We now consider the case in which the data are multi-way (D > 1) but single source P= 1,
and thus Xis N×P×D, a 3-way tensor. We propose the following bilinear model, which
5
is analogous to the one proposed by Lyu and others (2017) in the context of modeling the
coefficients for multi-way DWD.
We assume the covariate matrix B:P×Dhas rank-1 decomposition
B=wvT(2)
where w= [w1, ..., wP]Tand v= [v1, ..., vD]T.
Thus our model for each i1, ..., N is
yi=Xi·B=vXT
iw(3)
where vis 1 ×D,Xiis P×D, and wis P×1.
To interpret this model in the context of our motivating example, we may consider w
to represent a pattern in the metabolites that is predictive of ywhile vgives the relative
contribution at each time point.
Rank R model
In the previous model (2), we assumed the covariate matrix Bhad a rank-1 decomposition,
that is, the outcome is determined by combining a single pattern in each dimension of the
coefficient matrix. However, it is possible that multiple patterns contribute to the outcome.
For example, it may be that some metabolites are predictive of the outcome at an early
time point but others are only predictive of the outcome at a later time point. Consider a
new data structure where we assume the covariate matrix Bhas rank-R decomposition:
B=WVT(4)
where W:P×Rwith columns wr= [wr1, ..., wrp]Tand V:D×Rwith columns vr=
[vr1, ..., vrd]T, for r= 1, ..., R,R < min(P, D). Observe that the coefficient matrix Bin
the rank-1 multi-way model (2) is a special case of the rank-R multi-way model (4) when
R= 1.
This use of low-rank structure on the coefficients allows us to capture multi-way depen-
dence and identify relevant patterns in each dimension of the coefficient matrix.
2.2.4 Multisource and multi-way model
We now combine aspects of the multi-source and multi-way models into a single Bayesian
linear model to address the general MSMW framework introduced in Section 2.1. We
again assume the covariate matrix Bhas rank-R decomposition as in (4) where W=
[W[1], ..., W[M]] and each W[m]has columns w[m]
r= [wr1, ..., wrp]Tfor m= 1, ..., M and V
has columns vr= [vr1, ..., vrd]Tfor r= 1, ..., R,R < min(P, D).
2.2.5 Binary outcome
We now consider our model in the case where yis binary, i.e. yi {0,1}for i= 1, ..., N .
We can accommodate such data by modifying our approach to use a latent variable probit
6
model, similar to that described in Albert and Chib (1993). Suppose there exists an
auxiliary random variable zisuch that zi=Xi·B+i, where iNormal(0,1). We can
model our outcome variable yias an indicator for whether or not this latent variable is
positive, that is, yi= 1 if zi>0 and yi= 0 otherwise. This is equivalent to using a probit
link function:
P r(yi= 1|Xi) = P r(zi>0|Xi) = P r (Xi·B+i>0) = P r(i<Xi·B) = Φ(Xi·B)
where Φ(x) = 1
2πRx
−∞ e
t2
2dt.
2.3 Model estimation
2.3.1 Priors
As referenced in Section 2.2.1, we can model the coefficients in a single-source non-multi-
way model to come from a normal distribution with mean zero and variance τ. We extend
this approach to our MSMW model by using a mean-zero normal prior to estimate the
components in our covariate matrix B.
In order to accommodate our multi-way model, we do not estimate Bdirectly, but
instead estimate the components of our covariate matrix, either wand v, as outlined in
(2) for the rank-1 model or Wand Vas outlined in (4) for the rank-R model. We then
place mean-zero normal priors on each Wand V, that is, the entries of each W[m]are
independent with a Normal(0, τm) distribution, and the entries of Vare independent with
a Normal(0,1) distribution. We fix the variance of Vbecause Bis the product of Vand W,
and thus their respective scales are not identifiable and only the variance of Wneeds to be
modeled. This further allows us to model the variance of the contribution for each source
separately by considering each τm|W[m]for m= 1, ..., M. In order to facilitate an efficient
sampling algorithm for the posterior distribution, we place conjugate inverse-gamma priors
on the variance parameters, τmIG(α0, β0).
We assume that the error terms eare independent and normally distributed, e
MV N (0,Iσ2). For a continuous y,σ2may either be fixed or given a prior. If σ2is
unknown, by default we use an inverse-gamma prior distribution with arbitrarily small
hyperparameters as a non-informative prior, e.g., σ2IG(0.001,0.001). If yis binary,
then the error variance for the latent continuous variables zin Section 2.2.5 is fixed at
σ2= 1.
2.3.2 Full conditional distributions
Given the conjugate hyperpriors we’ve placed on τ, and fixing the variance of our error
terms at 1, we will have the following conditional distribution for each τm:
τm|WIG α0+P[m]
2, β0+1
2||W[m]||2
F.(5)
For our coefficient factor parameters, Wand V, standard linear model results with
conjugate normal priors (Lindley and Smith, 1972) produce:
vec(W)|y,τ,V, σ2MV N ((T1σ2+XT
vXv)1(XT
vy), σ2(T1σ2+XT
vXv)1) (6)
7
where T:RP ×RP is the diagonal prior covariance matrix with diagonal entries
[τ1. . . τ1
| {z }
P[1]
τ2. . . τ2
| {z }
P[2]
. . . τM. . . τM
| {z }
P[M]
]
repeated Rtimes, and Xv:N×RP is the matrix with row igiven by vec(XiV). Similarly,
vec(V)|y,W, σ2MV N ((Iσ2+XT
wXw)1(XT
wy), σ2(Iσ2+XT
vXv)1) (7)
where Xw:N×RD is the matrix with row igiven by vec(WXi).
Use of the non-informative conjugate prior (σ2IG(0.001,0.001)) yields the following
full conditional distribution for σ2:
σ2|y,BIG N
2+ 0.001,(yXB)T(yXB)
2+ 0.001.(8)
2.3.3 Data augmentation of binary case
Under the latent variable formulation for binary data described in Section 2.2.5, the full
conditional distributions for Wand Vare analogous to that in are analogous to that in
(6) and (7), respectively, but with zreplacing yand σ2= 1.
As a consequence of our latent variable modeling, the conditional distribution of zwill
be a truncated normal distribution, denoted as Ntrunc , as follows:
zi|y,BNtrunc(Xi·B,1) (9)
where the distribution is truncated at the right by 0 if yi= 0 and truncated at the left by
0 if yi= 1.
2.3.4 Gibbs sampling algorithm for continuous case
We approximate our posterior using a Gibbs sampling algorithm. Here we provide the
algorithm used for the continuous version of the MSMW model to draw samples from the
joint posterior distribution p(W,V,τ, σ2|X,y). The algorithm is given below for iterations
t= 1, ..., T :
1. Initialize W(1), σ(1), τ (1)
1, ..., τ (1)
M
2. Make the following draws for 2, ..., T
Draw V(t)|y, σ(t1),W(t1) as in (7)
Draw W(t)|y, σ(t1), τ (t1)
1, τ (t1)
2,V(t)as in (6)
Draw τ(t)
1, ..., τ (t)
M|W[1](t), ..., W[M](t)as in (5)
Calculate B(t)=W(t)V(t)T
Draw σ(t)|B,yas in (8) (if σ2is not fixed).
8
2.3.5 Gibbs sampling algorithm for binary case
In order to accommodate binary data in our Gibbs sampler, we must introduce our data
augmentation steps, in which we draw the latent continuous variables zi:
1. Initialize V(1),W(1),z(1), τ (1)
1, ..., τ (1)
M
2. Make the following draws for 2, ..., T
Draw V(t)|z,W(t1) as in (7), with zreplacing yand σ2= 1.
Draw W(t)|y,z(t),τ(t1),V(t)as in (6), with zreplacing yand σ2= 1.
Draw τ(t)
1, ..., τ (t)
M|W[1](t), ..., W[M](t)as in (5)
Calculate B(t)=W(t)V(t)T.
Draw z|y,B(t)as in (9).
2.3.6 Model Prediction
After running our Gibbs sampler to simulate draws from our posterior, we take the average
over sampling iterations B(1), ..., B(T)to obtain estimated coefficients ˆ
B. Given new data
Xfor Nsamples, we can then obtain a point estimate the outcomes yvia ˆy
l=X
l·ˆ
B.
For a binary outcome, Φ(X
l·ˆ
B) gives an estimate of the predicted probability of having an
outcome value of 1, and to translate this probability into a class prediction, we can simply
round the value to the nearest integer:
ˆy
l=(1 if Φ(X
l·ˆ
B)0.5
0 if Φ(X
l·ˆ
B)<0.5(10)
for l= 1, ..., N .
Alternatively, the Bayesian approach allows one to model the full posterior predictive
distribution with uncertainty. For the continuous case, draws y(t)
lfrom the posterior
predictive distribution can be obtained from the Gibbs draws via y(t)
lNormal(X
l·
B(t), σ2(t)) for l= 1, ..., N . In the binary case, draws from the posterior predictive can be
generated via y(t)
lBernoulli(Φ(X
l·B(t))) for l= 1, ..., N .
3 Results
3.1 Simulations
3.1.1 Data generation
We generated data under multiple scenarios to illustrate the relative benefits of incorpo-
rating multi-source or multi-way structure under different conditions. For all scenarios, we
simulated data sets X[1] :N×P1×Dand X[2] :N×P2×D, representing data from two
sources with Nobservations, P1and P2covariates from each source with P1=P2=P/2,
9
and Dtime points. We consider a low-dimensional scenario with N= 100 and P= 6 and
D= 5, and a high-dimensional scenario with N= 20 and P= 200 and D= 2 (closely
matching the application in Section 3.3). We generated the true coefficient array Bunder
one setting for which the sources contribute equally (τ1=τ2= 1) and on setting for which
only one source contributes (τ1= 0, τ2= 1). We also consider settings under which Bhas
a rank 1 or rank 2 decomposition, or where the coefficient matrix has no multiway struc-
ture (i.e., Bhas independent entries and is of full rank). In the non-multiway case, the
entries of the coefficients for each source B[m]:P[m]×Dare generated independently from
a Normal(0, τm) distribution. In the multi-way case for rank R= 1 or R= 2, we generate
W[m]:P[m]×Rand V:D×Rby simulating the entries of each W[m]
jindependently from
Normal(0, τm) for m= 1,2 and the entries of Vindependently from Normal(0,1). A rank
2 model was not considered for the high-dimensional case, because the full rank scenario is
already of rank 2 (D= 2).
Continuous outcome
For the continuous case, the entries of X[1] and X[2] were each generated independently
from a Normal(0,1) distribution. Then, after generating B, the response variables ywas
generated via yiNormal(Xi·B,1).
Binary outcome
For our first binary data generating procedure, similarly to the continuous case, the entries
of X[1] and X[2] were each generated independently from a Normal(0,1) distribution. Then,
after generating B, the response variables ywas generated via using the probit link function
yiBernoulli(Φ(Xi·B)).
Separate normal distributions
We considered a third case for which the outcome is binary and the distribution of Xi
depends on the outcome. Here, the outcome was generated deterministically, with half
of the Nobservations having value 0 and half having value 1: yi= 0 for i= 1, . . . , N/2
and yi= 1 for i=N/2 + 1, . . . , N . The coefficients Bare generated under the same
conditions above, and then Xis generated via Xi=B+Eiif yi= 0 and Xi=B+Ei
if yi= 1, with the entries of Eigenerated independently from a Normal(0,1) distribution
for i= 1, . . . , N. Note that this scenario does not explicitly match the assumptions of
our probit model, however, it approximates a realistic scenario for which the data have
different means depending on their class, which is detectable in the high dimensional case.
The estimated coefficients for the optimal linear classifier will be proportional to B.
3.1.2 Measures of performance considered
We assess predictive performance by applying our model to test data that were generated
from the same distributions as the training data with a larger sample size (N= 500). For
our simulations with a binary outcome, we used the prediction method outlined in (10)
10
and compare the predicted classification to the true classification to get a misclassification
rate. For our simulations with a continuous outcome, we compare the predicted outcome
with the true outcome by calculating the relative mean squared error ||yˆ
y||2
F/||y||2
F.
For all simulations, we also assess recovery of the underlying parameters by considering
the posterior coverage rates of the true parameters and also the correlation between the
estimated estimated and the true coefficients.
3.1.3 Models used for estimation
For each simulation condition, we ran a total of six different models that each made different
assumptions about the underlying structure of B.
For non-multiway models, the data were assumed not to follow a multi-way structure
and the data arrays were reorganized into a matrix of dimension N×P D where the ith
row gives vec(Xi). We also ran two models that did assume a multi-way structure; the first
of these models imposed the assumption of a rank 1 covariate coefficient matrix structure
(as in Equation 2) and the second of these models imposed the assumption of a rank 2
covariate coefficient matrix structure (as in Equation 4 for R= 2).
For the multi-source models, the covariates were assumed to come from two sources,
with half of the covariates from one source and half from the other. This means that two
independent priors on the covariate coefficients were fit as in (1) for m= 1,2. For the single-
source models, the data were assumed to come from one source (i.e., distinction between
X1and X2were ignored) and only one prior was placed on the covariate coefficients.
Taking all combinations of these models produces the following six that were fit in our
simulations:
1. Rank 2, Multi-source model (Rank2,MS) with τmIG(1,P[m]R) for m= 1,2
2. Rank 2, Single-source model (Rank2,SS) with τIG(1,PR)
3. Rank 1, Multi-source model (Rank1,MS), with τmIG(1,P[m]R) for m= 1,2
4. Rank 1, Single-source model (Rank1,SS) with τIG(1,P)
5. Non-multi-way, Multi-source model (FullRank,MS) with τ IG(1,P[m]d) for m=
1,2
6. Non-multi-way, Single-source model (FullRank,SS), with τIG(1,Pd).
3.2 Simulation results
The following tables show the results of the metrics outlined in 3.1.2 for all of our sim-
ulations, averagedd over 100 replications for each condition. For these tables, “MS” is
an abbreviation for “multi-source” and is used to indicate either a multi-source model or
a simulation condition in which the signal was equal across both sources (as opposed to
being entirely confined to a single source). “Rank” in these tables refers to the rank of the
true coefficient matrix, with “FullRank” referring to the simulation condition in which the
11
true coefficient matrix is generated without any multiway structure.For all tables, bolded
values indicate the best performing model based on a pairwise t-test. If multiple values are
bolded, then model performances were not significantly different at a 0.05 level.
Binary outcome, low dimensions (N=100, P1=3, P2=3, d=5)
Probit generated data
Tables 1 and 2 give the misclassification rate and correlation with the true discriminating
signal, respectively, for the low dimensional simulation with probit generated data. In
all cases, the model that best matched the data generation scenario performed the best
for both measures. The benefits of using the correct multiway structure (Rank 2, 1 or
full rank) tended to be more dramatic than that for matching the multi-source structure;
the relative differences are particularly large for the correlations with the true coefficients
shown in Table 2. There were a few cases in which a model did not match the true data
generation but performance was not statistically different from the model that did match
the true data generation; all such cases involved a multi-source model performing on-par
with a single-source model when the true data were single-source.
Separate normal data
Tables 3 and 4 give the misclassification rate and correlation with the true discriminating
signal, respectively, for the low dimensional simulation with data generated from separate
normal distributions. In all cases, the model that best matched the data generation scenario
performed the best for both measures. The benefits of using the correct multiway structure
tended to be more dramatic than that for matching the multi-source structure; the relative
differences are particularly large for the correlations shown in Table 4.
Continuous outcome, low dimension (N=100, P1=3, P2=3, d=5)
Tables 5 and 6 give the relative squared error and correlation with the true discriminating
signal, respectively, for the low dimensional simulation for data generated with a continuous
outcome. In general, the model that best matched the data generation scenario performed
the best for both measures, though some cases saw models that matched the true data
structure failing to outperform models that did not. Interestingly, the rank 2 models
closely match the performance of the rank 1 model (even if it is misspecified) but the
full rank model performs much worse under low rank structure. The benefits of using the
correct multiway structure (Rank 2, 1, or full rank) tended to be more dramatic than that
for matching the multi-source structure; the relative differences are particularly large for
the correlations shown in Table 4.
12
Binary outcome, high dimension (N=20, P1=100, P2=100, d=2)
Probit generated data
Tables 7 and 8 give the misclassification rate and correlation with the true discriminating
signal, respectively, for the high dimensional simulation with probit generated data. In all
cases, the model that best matched the data generation scenario performed the best for
both measures, though the performance was not always statistically significant in outper-
forming other models. In particular, the misclassification rates observed for all models were
close to 0.5, indicating performance only marginally better than random guessing. This
demonstrates the challenge in fitting predictive models to HDLSS data when the distribu-
tion of the data does not depend on the outcome. The correlation results were also not
very strong, though they do more clearly indicate better performance from the models that
match the true data generation.
Normal generated data
Tables 9 and 10 give the misclassification rate and correlation with the true discriminating
signal, respectively, for the high-dimensional simulation with data generated from separate
normal distributions. In general the misclassification rates are much better here than they
are in the high-dimensional probit scenario. In all cases, the model that best matched the
data generation scenario performed the best for both measures. The benefits of using the
correct multiway structure (Rank 1 or full rank) tended to be more dramatic than that for
matching the multi-source structure; the relative differences are particularly large for the
correlations shown in Table 10.
Continuous outcome, high dimension (N=20, P1=100, P2=100, d=2)
Tables 11 and 12 give the relative squared error and correlation with the true discriminating
signal, respectively, for the high dimensional simulation with probit generated data. In all
cases, the model that best matched the data generation scenario performed the best for
both measures, though the performance was not statistically significant in outperforming
other models. In particular, the relative squared errors observed for all models were rather
close to 1, indicating performance that is only marginally beneficial. The correlation results
were also not very strong, though they more clearly indicate better performance from the
models that match the true data generation.
3.3 Application to multi-omic iron deficiency
We applied our method of multi-source, multi-way Bayesian probit regression to our moti-
vating data on iron deficiency in an infant rhesus monkey model and assessed our ability to
discriminate between ID and iron sufficient (IS) infants based on the serum proteomic and
metabolomic profiles measured at two time points (4 and 6 months after birth). In this
model, infants destined to develop ID show evidence of ID (changes in serum iron indices
and lower reticulocyte hemoglobin content) at 4 months, with iron deficiency anemia (lower
hemoglobin and mean corpuscular volume) seen at 6 months (Lubach and Coe, 2006; Coe
13
and others, 2013; Rao and others, 2018; Sandri and others, 2020, 2021, 2022). Proteomic
and metabolomic changes in serum are seen in the preanemic and anemic periods (Sandri
and others, 2022). After routine pre-processing data were available for 227 metabolites and
205 proteins for 6 ID and 6 IS monkeys. We used the relatively non-informative prior to
infer the variances of the proteomic and metabolomic coefficients, τ2
1and τ2
2: an Inverse
Gamma distribution with parameters α= 1, β = 0.1.
We assessed the estimated probabilities of class membership under leave-one-out cross
validation (LOOCV) using the rank-1 model, for which the posterior predictive probability
for a held-out infant is inferred given the remaining N1 infants. The plot of these
probabilities demonstrated our model’s ability to achieve perfect separation between the
ID and IS samples in the estimated class probabilities (Figure 1). We also examined the
loadings for the individual proteins, individual metabolites, and each time point (Figure 2).
The proteomic and metabolomic loadings both show several biomarkers that are positively
and negatively associated with ID; moreover, the loadings have similar scales between the
two data sources, with τ1= 0.200 (proteomics) and τ2= 0.238 (metabolomics) indicating
that the signal discriminating ID from IS infants is of similar size.
To assess potential benefits of our approach, we compared the t-statistic for the differ-
ence in probit scores between the IS and ID groups under LOOCV to analogous approaches
that do not account for multi-source or multi-way structure. Table 13 shows the resulting
values for a multi-way (i.e., rank 1) or non multi-way (i.e., full rank) model using (1) only
the metabolite data, (2) only the proteomic data, or (3) both data sources. In all cases
the multi-way approach performs better, suggesting that the metabolomic and proteeomic
profiles discrimninating ID from IS infants are similar at 4-months and 6-months, and
we can improve power by accounting for this structure. Moreover, the chosen multi-way
model with both sources outperforms others with a t-statistic of 7.235, suggesting that
the metabolites and proteins have complementary information and we can improve perfor-
mance by combining them in a single model. Moreover, an analogous approach that did
not model the source variances separately acheived a small t-statistic of 5.521, suggesting
an advantage to accounting for heterogeneity between the sources.
4 Discussion
We have proposed a Bayesian linear model that can predict a binary or a continuous
outcome using data that are both multi-source and multi-way, with any number of sources
or dimensions. Both the simulation and data analysis results have shown that the proposed
MSMW model can improve classification accuracy and reduce MSE when the underlying
data have MSMW structure. However, the performance of any given approach depends on
the conditions that the data were generated, such as the true rank of the underlying signal or
whether different sources have different signal variances. Thus, practical data applications
of this model may require applying different versions of the method and comparing their
performance. In this article we have focused on three-way arrays (N×P×D), however,
extensions to high-order arrays are straightforward, for which the coefficients array will
take the form of a CP decomposition (Zhou and others, 2013; Guo and others, 2022).
14
5 Software
Software in the form of R code, together with a sample input data set and complete
documentation is available at https://github.com/BiostatsKim/BayesMSMW.
Acknowledgments
This work was supported by the National Institute of General Medical Sciences (NIGMS)
grant R01-GM130622. Funding for the data application in Section 3.3 was also provided
by grants from the National Institute of Health/Eunice Kennedy Shriver National Institute
of Child Health and Development [HD089989, HD080201, HD057064 and HD39386].
References
Albert, James H and Chib, Siddhartha. (1993). Bayesian analysis of binary and
polychotomous response data. Journal of the American statistical Association 88(422),
669–679.
Coe, Christopher L, Lubach, Gabriele R, Bianco, Laura and Beard, John L.
(2009). A history of iron deficiency anemia during infancy alters brain monoamine
activity later in juvenile monkeys. Developmental Psychobiology: The Journal of the
International Society for Developmental Psychobiology 51(3), 301–309.
Coe, Christopher L, Lubach, Gabriele R, Busbridge, Mark and Chapman,
Richard S. (2013). Optimal iron fortification of maternal diet during pregnancy and
nursing for investigating and preventing iron deficiency in young rhesus monkeys. Re-
search in veterinary science 94(3), 549–554.
Geguchadze, Ramaz N, Coe, Christopher L, Lubach, Gabriele R, Clardy,
Thomas W, Beard, John L and Connor, James R. (2008). Csf proteomic anal-
ysis reveals persistent iron deficiency-induced alterations in non-human primate infants.
Journal of neurochemistry 105(1), 127–136.
Gloaguen, Arnaud, Philippe, Cathy, Frouin, Vincent, Gennari, Giulia,
Dehaene-Lambertz, Ghislaine, Le Brusquet, Laurent and Tenenhaus,
Arthur. (2022). Multiway generalized canonical correlation analysis. Biostatis-
tics 23(1), 240–256.
Guhaniyogi, Rajarshi, Qamar, Shaan and Dunson, David B. (2017). Bayesian
tensor regression. The Journal of Machine Learning Research 18(1), 2733–2763.
Guo, Bin, Eberly, Lynn E, Henry, Pierre-Gilles, Lenglet, Christophe and
Lock, Eric F. (2022). Multiway sparse distance weighted discrimination. Journal of
Computational and Graphical Statistics (just-accepted), 1–43.
15
Huopaniemi, Ilkka, Suvitaival, Tommi, Nikkila, Janne, Oresic, Matej and
Kaski, Samuel. (2010). Multivariate multi-way analysis of multi-source data. Bioin-
formatics 26(12), i391–i398.
Kolda, Tamara G and Bader, Brett W. (2009). Tensor decompositions and appli-
cations. SIAM review 51(3), 455–500.
Li, Xiaoshan, Xu, Da, Zhou, Hua and Li, Lexin. (2018). Tucker tensor regression
and neuroimaging analysis. Statistics in Biosciences 10(3), 520–545.
Lindley, Dennis V and Smith, Adrian FM. (1972). Bayes estimates for the linear
model. Journal of the Royal Statistical Society: Series B (Methodological) 34(1), 1–18.
Lubach, Gabriele R and Coe, Christopher L. (2006). Preconception maternal iron
status is a risk factor for iron deficiency in infant rhesus monkeys (macaca mulatta). The
Journal of nutrition 136(9), 2345–2349.
Lyu, Tianmeng, Lock, Eric F and Eberly, Lynn E. (2017). Discriminating sample
groups with multi-way data. Biostatistics 18(3), 434–450.
Miranda, Michelle F, Zhu, Hongtu, Ibrahim, Joseph G, Initiative,
Alzheimer’s Disease Neuroimaging and others. (2018). Tprm: Tensor partition
regression models with applications in imaging biomarker detection. The annals of ap-
plied statistics 12(3), 1422.
Palzer, Elise F, Wendt, Christine H, Bowler, Russell P, Hersh, Craig P,
Safo, Sandra E and Lock, Eric F. (2022). sjive: Supervised joint and individual
variation explained. Computational Statistics & Data Analysis 175, 107547.
Patton, Stephanie M, Coe, Christopher L, Lubach, Gabriele R and Connor,
James R. (2012). Quantitative proteomic analyses of cerebrospinal fluid using itraq in
a primate model of iron deficiency anemia. Developmental neuroscience 34(4), 354–365.
Rao, Raghavendra, Ennis, Kathleen, Lubach, Gabriele R, Lock, Eric F,
Georgieff, Michael K and Coe, Christopher L. (2018). Metabolomic analysis
of csf indicates brain metabolic impairment precedes hematological indices of anemia in
the iron-deficient infant monkey. Nutritional neuroscience 21(1), 40–48.
Rao, Raghavendra, Ennis, Kathleen, Oz, Gulin, Lubach, Gabriele R,
Georgieff, Michael K and Coe, Christopher L. (2013). Metabolomic analysis
of cerebrospinal fluid indicates iron deficiency compromises cerebral energy metabolism
in the infant monkey. Neurochemical research 38(3), 573–580.
Rodosthenous, Theodoulos, Shahrezaei, Vahid and Evangelou, Marina.
(2020). Integrating multi-omics data through sparse canonical correlation analysis for
the prediction of complex traits: a comparison study. Bioinformatics 36(17), 4616–4625.
16
Sandri, Brian J, Kim, Jonathan, Lubach, Gabriele R, Lock, Eric F, Guer-
rero, Candace, Higgins, LeeAnn, Markowski, Todd W, Kling, Pamela J,
Georgieff, Michael K, Coe, Christopher L and others. (2022). Multiomic pro-
filing of iron-deficient infant monkeys reveals alterations in neurologically important bio-
chemicals in serum and cerebrospinal fluid before the onset of anemia. American Journal
of Physiology-Regulatory, Integrative and Comparative Physiology 322(6), R486–R500.
Sandri, Brian J, Lubach, Gabriele R, Lock, Eric F, Georgieff, Michael K,
Kling, Pamela J, Coe, Christopher L and Rao, Raghavendra B. (2020).
Early-life iron deficiency and its natural resolution are associated with altered serum
metabolomic profiles in infant rhesus monkeys. The Journal of nutrition 150(4), 685–
693.
Sandri, Brian J, Lubach, Gabriele R, Lock, Eric F, Kling, Pamela J, Georgi-
eff, Michael K, Coe, Christopher L and Rao, Raghavendra B. (2021). Cor-
recting iron deficiency anemia with iron dextran alters the serum metabolomic profile of
the infant rhesus monkey. The American Journal of Clinical Nutrition 113(4), 915–923.
Singh, Amrit, Shannon, Casey P, Gautier, Beno
ˆ
ıt, Rohart, Florian, Vacher,
Micha¨
el, Tebbutt, Scott J and Lˆ
e Cao, Kim-Anh. (2019). Diablo: an integra-
tive approach for identifying key molecular drivers from multi-omics assays. Bioinfor-
matics 35(17), 3055–3062.
Tao, Dacheng, Li, Xuelong, Hu, Weiming, Maybank, Stephen and Wu, Xin-
dong. (2005). Supervised tensor learning. In: Fifth IEEE International Conference on
Data Mining (ICDM’05). IEEE. pp. 8–pp.
Van De Wiel, Mark A, Lien, Tonje G, Verlaat, Wina, van Wieringen, Wes-
sel N and Wilting, Saskia M. (2016). Better prediction by use of co-data: adaptive
group-regularized ridge regression. Statistics in Medicine 35(3), 368–381.
White, Brian S, Khan, Suleiman A, Mason, Mike J, Ammad-Ud-Din, Muham-
mad, Potdar, Swapnil, Malani, Disha, Kuusanm¨
aki, Heikki, Druker,
Brian J, Heckman, Caroline, Kallioniemi, Olli and others. (2021). Bayesian
multi-source regression and monocyte-associated gene expression predict bcl-2 inhibitor
resistance in acute myeloid leukemia. NPJ precision oncology 5(1), 1–11.
Zhang, Yunfeng and Gaynanova, Irina. (2021). Joint association and classification
analysis of multi-view data. Biometrics.
Zhou, Hua, Li, Lexin and Zhu, Hongtu. (2013). Tensor regression with applications
in neuroimaging data analysis. Journal of the American Statistical Association 108(502),
540–552.
17
Misclassification: low-dimensional probit
Rank: 2 Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No
Rank2,MS 0.214 0.217 0.264 0.229 0.194 0.222
Rank2,SS 0.216 0.216 0.264 0.227 0.198 0.221
Rank1,MS 0.225 0.244 0.251 0.210 0.249 0.285
Rank1,SS 0.224 0.243 0.252 0.210 0.250 0.284
FullRank,MS 0.233 0.241 0.293 0.266 0.174 0.170
FullRank,SS 0.248 0.241 0.300 0.265 0.196 0.168
Table 1: Test misclassification rate for low-dimensional probit scenario.
Correlations: low-dimensional probit
Rank: 2 Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No
Rank2,MS 0.750 0.739 0.620 0.710 0.131 0.135
Rank2,SS 0.748 0.740 0.620 0.713 0.130 0.136
Rank1,MS 0.730 0.690 0.642 0.737 0.117 0.094
Rank1,SS 0.730 0.691 0.639 0.737 0.116 0.095
FullRank,MS 0.096 0.109 0.085 0.106 0.837 0.857
FullRank,SS 0.094 0.110 0.083 0.109 0.805 0.859
Table 2: Correlation with true coefficients for the low-dimensional probit scenario.
18
Misclassification: low-dimensional separate normal
Rank: 2 Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No
Rank2,MS 0.111 0.034 0.246 0.134 0.176 0.118
Rank2,SS 0.111 0.033 0.245 0.132 0.175 0.115
Rank1,MS 0.117 0.043 0.240 0.129 0.200 0.155
Rank1,SS 0.116 0.042 0.241 0.128 0.200 0.154
FullRank,MS 0.115 0.036 0.258 0.146 0.168 0.092
FullRank,SS 0.117 0.035 0.261 0.145 0.174 0.091
Table 3: Test misclassification rate for the low-dimensional separate normal scenario.
Correlations: low-dimensional separate normal
Rank: 2 Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No
Rank2,MS 0.876 0.887 0.764 0.877 0.164 0.091
Rank2,SS 0.873 0.897 0.770 0.885 0.170 0.098
Rank1,MS 0.846 0.844 0.796 0.901 0.167 0.087
Rank1,SS 0.848 0.845 0.796 0.905 0.166 0.090
FullRank,MS 0.196 0.155 0.111 0.125 0.832 0.845
FullRank,SS 0.192 0.160 0.106 0.125 0.814 0.853
Table 4: Correlation with true coefficients for the low-dimensional separate normal scenario.
Relative Squared Error: low-dimensional continuous
Rank: 2 Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No
Rank2,MS 0.742 0.446 0.719 0.616 0.500 0.468
Rank2,SS 0.740 0.444 0.716 0.613 0.501 0.464
Rank1,MS 0.758 0.550 0.700 0.602 0.654 0.664
Rank1,SS 0.757 0.549 0.700 0.601 0.654 0.661
FullRank,MS 0.811 0.487 0.801 0.689 0.485 0.287
FullRank,SS 0.784 0.473 0.777 0.668 0.485 0.286
Table 5: Mean relative squared prediction error on test data for the low-dimensional con-
tinuous scenario.
19
Correlations: low-dimensional continuous
Rank: 2 Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No
Rank2,MS 0.703 0.873 0.751 0.810 0.222 0.099
Rank2,SS 0.702 0.874 0.751 0.811 0.222 0.101
Rank1,MS 0.667 0.792 0.774 0.819 0.191 0.092
Rank1,SS 0.666 0.792 0.772 0.820 0.192 0.079
FullRank,MS 0.137 0.153 0.062 0.141 0.882 0.943
FullRank,SS 0.136 0.152 0.060 0.142 0.874 0.944
Table 6: Correlation with true coefficients for the low-dimensional continuous scenario.
Misclassification: high-dimensional probit
Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No
Rank1,MS 0.444 0.453 0.451 0.448
Rank1,SS 0.448 0.454 0.454 0.447
FullRank,MS 0.446 0.454 0.448 0.444
FullRank,SS 0.449 0.452 0.453 0.443
Table 7: Test misclassification rates for the high-dimensional probit scenario.
Correlations: high-dimensional probit
Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No
Rank1,MS 0.142 0.134 0.081 0.068
Rank1,SS 0.147 0.148 0.079 0.070
FullRank,MS 0.084 0.080 0.181 0.163
FullRank,SS 0.080 0.082 0.172 0.172
Table 8: Correlation with true coefficients for the high-dimensional probit scenario.
Misclassification: high-dimensional separate normal
Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No
Rank1,MS 0.203 0.178 0.172 0.068
Rank1,SS 0.213 0.178 0.199 0.061
FullRank,MS 0.215 0.198 0.157 0.054
FullRank,SS 0.228 0.196 0.188 0.052
Table 9: Test misclassification rates for the high-dimensional separate normal scenario.
20
Correlation: high-dimensional separate normal data
Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No
Rank1,MS 0.468 0.479 0.202 0.251
Rank1,SS 0.423 0.492 0.187 0.253
FullRank,MS 0.226 0.200 0.487 0.534
FullRank,SS 0.200 0.205 0.421 0.546
Table 10: Correlation with true coefficients for the high-dimensional separate normal sce-
nario.
Relative squared error: high-dimensional continuous
Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No
Rank1,MS 0.972 0.965 0.957 0.984
Rank1,SS 0.977 0.969 0.964 0.975
FullRank,MS 0.984 0.972 0.952 0.963
FullRank,SS 0.983 0.968 0.955 0.958
Table 11: Mean relative squared prediction error on test data for the high-dimensional
continuous scenario.
Correlation Results: Continuous data
Rank: 1 Full rank
Model MS: Yes MS: No MS: Yes MS: No
Rank1,MS 0.179 0.185 0.096 0.091
Rank1,SS 0.169 0.180 0.090 0.099
FullRank,MS 0.072 0.100 0.223 0.211
FullRank,SS 0.070 0.102 0.215 0.218
Table 12: Correlation with true coefficients for the high-dimensional continuous scenario.
t-test statistics
Multi-way Non-multi-way
Metabolites only 5.773 4.407
Proteins only 5.182 3.935
All data 7.235 4.741
Table 13: Test statistics from using two-sample t tests to evaluate separation between ID
and IS samples achieved by different models under LOOCV.
21
Bayesian MSMW model for ID data
Probit scores
Density
0
1
2
3
4
5
−0.5 0.0 0.5
Status
ID
IS
Figure 1: Probit scores from applying MSMW model to motivating data under LOOCV.
22
0.0
0.1
0.2
0.3
0.4
4 month 6 month
Time
Loading
Time Loadings
−0.050
−0.025
0.000
0.025
0.050
0
50
100
150
200
Protein index number
Loading
Protein Loadings
−0.050
−0.025
0.000
0.025
0
50
100
150
200
250
Metabolite index number
Loading
Metabolite Loadings
Figure 2: Factor loadings from applying MSMW model to motivating data.
23
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The FDA recently approved eight targeted therapies for acute myeloid leukemia (AML), including the BCL-2 inhibitor venetoclax. Maximizing efficacy of these treatments requires refining patient selection. To this end, we analyzed two recent AML studies profiling the gene expression and ex vivo drug response of primary patient samples. We find that ex vivo samples often exhibit a general sensitivity to (any) drug exposure, independent of drug target. We observe that this “general response across drugs” (GRD) is associated with FLT3 -ITD mutations, clinical response to standard induction chemotherapy, and overall survival. Further, incorporating GRD into expression-based regression models trained on one of the studies improved their performance in predicting ex vivo response in the second study, thus signifying its relevance to precision oncology efforts. We find that venetoclax response is independent of GRD but instead show that it is linked to expression of monocyte-associated genes by developing and applying a multi-source Bayesian regression approach. The method shares information across studies to robustly identify biomarkers of drug response and is broadly applicable in integrative analyses.
Article
Full-text available
Regularized generalized canonical correlation analysis (RGCCA) is a general multiblock data analysis framework that encompasses several important multivariate analysis methods such as principal component analysis, partial least squares regression, and several versions of generalized canonical correlation analysis. In this article, we extend RGCCA to the case where at least one block has a tensor structure. This method is called multiway generalized canonical correlation analysis (MGCCA). Convergence properties of the MGCCA algorithm are studied, and computation of higher-level components are discussed. The usefulness of MGCCA is shown on simulation and on the analysis of a cognitive study in human infants using electroencephalography (EEG).
Article
Full-text available
Motivation: Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p ≫ n) data, such as OMICS. The sparse variant of Canonical Correlation Analysis (CCA) approach is a promising one that seeks to penalise the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results: Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. (2009), penalised matrix decomposition CCA proposed by Witten and Tibshirani (2009) and its extension proposed by Suo et al. (2017). The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability: https://github.com/theorod93/sCCA. Supplementary information: Supplementary data and material are available at Bioinformatics online.
Article
Full-text available
High-dimensional linear classifiers, such as the support vector machine (SVM) and distance weighted discrimination (DWD), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice data are often multi-way, or measured over multiple dimensions. For example, metabolite abundance may be measured over multiple regions or tissues, or gene expression may be measured over multiple time points, for the same subjects. We propose a framework for linear classification of high-dimensional multi-way data, in which coefficients can be factorized into weights that are specific to each dimension. More generally, the coefficients for each measurement in a multi-way dataset are assumed to have low-rank structure. This framework extends existing classification techniques, and we have implemented multi-way versions of SVM and DWD. We describe informative simulation results, and apply multi-way DWD to data for two very different clinical research studies. The first study uses metabolite magnetic resonance spectroscopy data over multiple brain regions to compare patients with and without spinocerebellar ataxia, the second uses publicly available gene expression time-course data to compare treatment responses for patients with multiple sclerosis. Our method improves performance and simplifies interpretation over naive applications of full rank linear classification to multi-way data. An R package is available at https://github.com/lockEF/MultiwayClassification .
Article
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. The proposed method, supervised joint and individual variation explained (sJIVE), can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. These two components are weighted to compromise between explaining variation in the multi-source data and in the outcome. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data. An application to data from the COPDGene study explores gene expression and proteomic patterns associated with lung function.
Article
Multi-view data, that is matched sets of measurements on the same subjects, have become increasingly common with advances in multi-omics technology. Often, it is of interest to find associations between the views that are related to the intrinsic class memberships. Existing association methods cannot directly incorporate class information, while existing classification methods do not take into account between-views associations. In this work, we propose a framework for Joint Association and Classification Analysis of multi-view data (JACA). Our goal is not to merely improve the misclassification rates, but to provide a latent representation of high-dimensional data that is both relevant for the subtype discrimination and coherent across the views. We motivate the methodology by establishing a connection between canonical correlation analysis and discriminant analysis. We also establish the estimation consistency of JACA in high-dimensional settings. A distinct advantage of JACA is that it can be applied to the multi-view data with block-missing structure, that is to cases where a subset of views or class labels is missing for some subjects. The application of JACA to quantify the associations between RNAseq and miRNA views with respect to consensus molecular subtypes in colorectal cancer data from The Cancer Genome Atlas project leads to improved misclassification rates and stronger found associations compared to existing methods.
Article
Background: The effects of infantile iron deficiency anemia (IDA) extend beyond hematological indices and include short- and long-term adverse effects on multiple cells and tissues. IDA is associated with an abnormal serum metabolomic profile, characterized by altered hepatic metabolism, lowered NAD flux, increased nucleoside levels, and a reduction in circulating dopamine levels. Objectives: The objective of this study was to determine whether the serum metabolomic profile is normalized after rapid correction of IDA using iron dextran injections. Methods: Blood was collected from iron-sufficient (IS; n = 10) and IDA (n = 12) rhesus infants at 6 months of age. IDA infants were then administered iron dextran and vitamin B via intramuscular injections at weekly intervals for 2 to 8 weeks. Their hematological and metabolomic statuses were evaluated following treatment and compared with baseline and a separate group of age-matched IS infants (n = 5). Results: Serum metabolomic profiles assessed at baseline and after treatment via HPLC/MS using isobaric standards identified 654 quantifiable metabolites. At baseline, 53 metabolites differed between IS and IDA infants. Iron treatment restored traditional hematological indices, including hemoglobin and mean corpuscular volume, into the normal range, but the metabolite profile in the IDA group after iron treatment was markedly altered, with 323 metabolites differentially expressed when compared with an infant's own baseline profile. Conclusions: Rapid correction of IDA with iron dextran resulted in extensive metabolic changes across biochemical pathways indexing the liver function, bile acid release, essential fatty acid production, nucleoside release, and several neurologically important metabolites. The results highlight the importance of a cautious approach when developing a route and regimen of iron repletion to treat infantile IDA.
Article
Motivation: In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. Results: Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. Availability and implementation: DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Objectives: Iron deficiency (ID) anemia leads to long-term neurodevelopmental deficits by altering iron-dependent brain metabolism. The objective of the study was to determine if ID induces metabolomic abnormalities in the cerebrospinal fluid (CSF) in the pre-anemic stage and to ascertain the aspects of abnormal brain metabolism affected. Methods: Standard hematological parameters [hemoglobin (Hgb), mean corpuscular volume (MCV), transferrin (Tf) saturation, and zinc protoporphyrin/heme (ZnPP/H)] were compared at 2, 4, 6, 8, and 12 months in iron-sufficient (IS; n = 7) and iron-deficient (ID; n = 7) infant rhesus monkeys. Five CSF metabolite ratios were determined at 4, 8, and 12 months using (1)H NMR spectroscopy at 16.4 T and compared between groups and in relation to hematologic parameters. Results: ID infants developed ID (Tf saturation < 25%) by 4 months of age and all became anemic (Hgb < 110 g/L and MCV < 60 fL) at 6 months. Their heme indices normalized by 12 months. Pyruvate/glutamine and phosphocreatine/creatine (PCr/Cr) ratios in CSF were lower in the ID infants by 4 months (P < 0.05). The PCr/Cr ratio remained lower at 8 months (P = 0.02). ZnPP/H, an established blood marker of pre-anemic ID, was positively correlated with the CSF citrate/glutamine ratio (marginal correlation, 0.34; P < 0.001; family wise error rate = 0.001). Discussion: Metabolomic analysis of the CSF is sensitive for detecting the effects of pre-anemic ID on brain energy metabolism. Persistence of a lower PCr/Cr ratio at 8 months, even as hematological measures demonstrated recovery from anemia, indicate that the restoration of brain energy metabolism is delayed. Metabolomic platforms offer a useful tool for early detection of the impact of ID on brain metabolism in infants.