Available via license: CC BY 4.0

Content may be subject to copyright.

Bayesian predictive modeling of multi-source

multi-way data

Jonathan Kim1, Brian J. Sandri2,3, Raghavendra B. Rao2,3, Eric F. Lock1

1Division of Biostatistics, School of Public Health

2Division of Neonatology, Department of Pediatrics

3Masonic Institute for the Developing Brain University of Minnesota

August 9, 2022

Abstract

We develop a Bayesian approach to predict a continuous or binary outcome from

data that are collected from multiple sources with a multi-way (i.e.. multidimen-

sional tensor) structure. As a motivating example we consider molecular data from

multiple ’omics sources, each measured over multiple developmental time points, as

predictors of early-life iron deﬁciency (ID) in a rhesus monkey model. We use a

linear model with a low-rank structure on the coeﬃcients to capture multi-way de-

pendence and model the variance of the coeﬃcients separately across each source

to infer their relative contributions. Conjugate priors facilitate an eﬃcient Gibbs

sampling algorithm for posterior inference, assuming a continuous outcome with nor-

mal errors or a binary outcome with a probit link. Simulations demonstrate that

our model performs as expected in terms of misclassiﬁcation rates and correlation of

estimated coeﬃcients with true coeﬃcients, with large gains in performance by incor-

porating multi-way structure and modest gains when accounting for diﬀering signal

sizes across the diﬀerent sources. Moreover, it provides robust classiﬁcation of ID

monkeys for our motivating application. Software in the form of R code is available

at https://github.com/BiostatsKim/BayesMSMW.

1 Introduction

Technological advancements in biomedical research are producing datasets that are very

large and have complex structures. Some data are represented as a multi-way array, also

called a tensor, which extends the two-way data matrix to higher dimensions. Some data

are multi-source, which involves features from diﬀerent sources of data matched by samples

(this is also known as multi-view data). A growing number of datasets are simultaneously

multi-source and multi-way (MSMW). As a motivating example of MSMW data, we con-

sider predictors of early-life iron deﬁciency (ID) in infant monkeys using data described in

1

arXiv:2208.03396v1 [stat.ME] 5 Aug 2022

Sandri and others (2022). In this naturalistic ID model in infant rhesus monkeys, 20−30%

of infants develop ID and anemia between 4 and 6 months of age due to a combination of

lower iron stores at birth and rapid postnatal growth rate (Lubach and Coe, 2006; Coe and

others, 2013). Prior studies in this model have shown that the ID infants have metabolomic

and proteomic abnormalities in the serum and cerebrospinal ﬂuid in the preanemic and ane-

mic periods with residual changes persisting even after the resolution of anemia with iron

treatment (Geguchadze and others, 2008; Coe and others , 2009; Patton and others, 2012;

Rao and others, 2013, 2018; Sandri and others, 2020, 2021, 2022). Data were available from

two sources, serum proteomics and serum metabolomics, collected at two time points, 4 and

6 months after birth. The data therefore form two 3-way arrays: [monkeys ×proteomics

×time] and [monkeys ×metabolomics ×time]. This motivating data is therefore MSMW

and we are interested in identifying signals in the biomarkers that can predict ID status.

To understand the signiﬁcance of incorporating MSMW structure into analysis, consider

a naive approach in which each source’s multi-dimensional data array is transformed into

a vector and features from all sources are concatenated into a single vector. While this

approach would produce data that could be analyzed using one of the many methods

available for vector-valued data, it would also have a number of shortcomings. Ignoring the

multi-way structure would not allow for consideration of dependence across dimensions.

Ignoring the multi-source structure would mean that any signal present in features from

smaller sources could be overrun by noise from larger sources with comparatively less signal.

A common aspect of MSMW data is the presence of far more features than samples,

often referred to as high-dimension low-sample size (HDLSS) data. While MSMW data

need not necessarily be HDLSS, it is suﬃciently common that methods for handling MSMW

data will ideally allow for HDLSS structure. A Bayesian framework provides more ﬂexibility

for model-based supervised analysis of high-dimensional data, as appropriate regularization

can be induced through the speciﬁed prior distribution.

In what follows we brieﬂy review existing methods for predictive modeling of data that

are multi-source (Section 1.1) and multi-way (Section 1.2); our methodological contribu-

tions are summarized in Section 1.3.

1.1 Multi-source data

The issue of integrating data from multiple sources has been addressed in a variety of ways

for diﬀerent tasks. For predicting an outcome from multi-source data, several approaches

extend unsupervised methods that were originally designed to integrate multi-source data

without prediction. Examples include various supervised extensions of canonical correlation

analysis (CCA). Rodosthenous and others (2020) extend CCA to an arbitrary number of

sources and an outcome by means of a generalized sparsity parameter. Joint association

and classiﬁcation (JACA) (Zhang and Gaynanova, 2021) is a combination of CCA and

linear discriminant analysis for a binary outcome. Data integration analysis for biomarker

discovery using latent components (DIABLO) is an extension of both sparse projection

to latent structure discriminant analysis to multi-omics analyses and sparse generalized

CCA to a supervised analysis framework by Singh and others (2019), and a combination

of multivariate ANOVA with Bayesian CCA was developed by Huopaniemi and others

2

(2010). Supervised JIVE (sJIVE) (Palzer and others, 2022) was developed as a supervised

extension for prediction using the JIVE method, which decomposes data into latent factors

that are shared or speciﬁc to each source.

Other methods can be used directly in a supervised context (i.e., classiﬁcation and re-

gression), without incorporating aspects of unsupervised analysis. One approach developed

by Van De Wiel and others (2016) handles “co-data”, which they deﬁned as “all information

on the measured variables other than their numerical values for the given study”. In partic-

ular, their method involved partitioning variables into groups and imposing group-speciﬁc

penalties for ridge regression. This approach has some analogues to the multi-source prob-

lem in that it is able to perform prediction on a binary or continuous outcome using data

from multiple groups; though such “groups” of data included p-values from previous studies

or genomic annotations, in the multi-source context they may be deﬁned by which source

the variable belongs to (e.g., proteins or metabolites). A Bayesian approach to multi-source

data that make use of the prior distribution to accommodate diﬀerent sources has been

used recently by White and others (2021). However, their Bayesian Multi-Source Regres-

sion (BMSR) assumed double-matched multi-source data, i.e., the same features present

across all sources, and involved predicting diﬀerent outcomes for each source instead of a

single outcome aﬀected by multiple sources.

One limitation of all of these multi-source methods is that they do not have the ability

to accommodate data that exists in multiple ways, thereby limiting their physiological

importance to that trough of data potentially limiting critical ﬁndings.

1.2 Multi-way data

Methods developed for analyzing data with multi-way structure can be divided into unsu-

pervised and supervised categories, with the latter further divided between classiﬁcation

and regression methods. Unsupervised approaches to handling multi-way data predomi-

nantly involve dimension reduction techniques, which reduce the number of features in the

data to a more manageable size while preserving the overall integrity of the data. Many

such methods of tensor decomposition are outlined in Kolda and Bader (2009). Gloaguen

and others (2022) developed a multi-way extension of regularized generalized canonical

correlation analysis that can accommodate data with a tensor structure from an arbitrary

number sources by incorporating Kronecker constraints into the optimization problem.

For supervised methods involving classiﬁcation, there is growing literature on extend-

ing classiﬁers of vectors to multi-way arrays using factorization and dimension reduction

techniques. Tao and others (2005) proposed a supervised tensor learning framework that

generalized classiﬁers by performing a rank-1 decomposition on the coeﬃcients to reduce

their dimension to a single set of weights for each dimension. Lyu and others (2017) pro-

posed a multi-way version of the classiﬁcation method distance weighted discrimination

(DWD) under the assumption that the coeﬃcient array is low-rank. Their implementation

of multi-way DWD was shown to dramatically improve performance over two-way classi-

ﬁers when the data have multi-way structure. However, their method is restricted to use

for three-way data. Guo and others (2022) proposed an extension of multi-way DWD that

also imposed a low-rank structure on the coeﬃcient array, but allowed for data with an

3

arbitrary number of ways and accounted for sparsity.

Supervised methods of regression also build on dimension reduction techniques by ex-

tending them to the regression context. Both Zhou and others (2013) and Li and others

(2018) propose maximum likelihood estimation algorithms that could perform regression

with array-valued covariates through dimension reduction, with the former using CAN-

DECOMP/PARAFAC (CP) decomposition and the latter using Tucker decomposition. A

Bayesian formulation of tensor regression was developed by Miranda and others (2018),

which involves a multi-step process of partitioning tensor data into smaller sub-tensors,

reducing these sub-tensors via CP decomposition, and performing regression with sparsity-

inducing priors to identify informative sub-tensors. Another Bayesian approach to tensor

regression with a scalar response was developed by Guhaniyogi and others (2017) by means

of a novel multi-way shrinkage prior, which allows for simultaneous shrinkage of parameters

across all ways of data.

In the same way that existing multi-source methods have yet to be extended to ac-

commodate multi-way data, existing supervised multi-way methods are generally unable

to incorporate data from multiple sources.

1.3 Contributions: method for multi-source multi-way data

In this paper, we develop a Bayesian linear model that can perform regression or classi-

ﬁcation on MSMW data for either a continuous outcome with normal errors or a binary

outcome with a probit link, respectively. The central assumption for our multi-way ap-

proach is that the signal discriminating the ways can be eﬃciently represented by mean-

ingful patterns in each dimension, which we identify by imposing a low-rank structure on

the coeﬃcient array. The central assumption for our multi-source approach is that the

signal discriminating the sources can be eﬃciently represented by modeling the variances

of the coeﬃcients separately across each source to infer their relative contributions. We

incorporate both of these approaches into a single model under a Bayesian framework. We

also apply our method to a real-world MSMW dataset by predicting iron deﬁciency status

in infant monkeys based on multi-omic tissue samples.

2 Methods

2.1 Notation and framework

Throughout this article bold lowercase characters (a) denote vectors, bold uppercase char-

acters (A) denote matrices, and blackboard bold uppercase characters (A) denote multi-way

arrays of the speciﬁed dimension (e.g., A:P1×P2× ··· × PK). Square brackets index

entries within an array, e.g., Ap1, p2, . . . , pK. Superscripts in square brackets are used to

denote individual sources within a multi-source set of data, e.g., A[1],A[2],...,A[M]. Deﬁne

the generalized inner product for two arrays Aand Bof the same dimension is

A·B=

P1

X

p1=1 ···

PK

X

pK=1

A[p1, . . . , pK]B[p1, . . . , pK].

4

Deﬁne || · ||Fas the Frobenius norm and vec(A) as the vectorization of the entries in A.

For our context, X:N×[P[1],· ·· , P [M]]×Dgives data in the form of a 3-way array for N

subjects, where P[m]is the size of the mth source for m= 1, ..., M with P=PM

m=1 P[m], and

Dis an additional way for which we have data available for all subjects and sources. Each

subject has a response variable yi, which may be binary or continuous; let y= [y1, ..., yN].

Our goal is to predict the outcome ybased on the multiway covariates X.

2.2 Model

2.2.1 Bayesian linear model

We ﬁrst brieﬂy consider the special case in which we have only one way of data from only

one source, M= 1 and D= 1, and yis continuous. This is the classical setting with P

covariates available for Nsubjects, X:N×P. The basic linear model is y=XTb+e

where b= [b1, ..., bP] is the vector of covariate coeﬃcients and e= [1, ..., N] is the vector

of error terms, which are assumed to have distribution i

iid

∼Normal(0, σ2).

Under a Bayesian framework, we place a prior distribution on band we assume all

bj, j = 1, ..., P are independent and identically distributed under this prior. If we let that

prior be a normal distribution with mean zero and variance τ, i.e. bj∼Normal(0, τ ),

we can also place a hyperprior on the variance τ. This has the advantage of empirically

controlling the level of shrinkage of the coeﬃcient toward 0, via the posterior for τ. In

subsequent sections we extend this model to the multi-source and multi-way scenarios.

2.2.2 Multi-source model

Now, suppose our covariates come from M > 1 diﬀerent sources, which can be conceived as

Mdatasets X1, ..., XMwith each Xmbeing an N×P[m]matrix such that PM

m=1 P[m]=P.

The vector of coeﬃcients, b, can be represented as a concatenation of Mvectors each of

length P[m], i.e. b= [b[1], ..., b[M]] with each b[m]= [b[m]

1, ..., b[m]

P[m]] for m= 1, ..., M .

To infer the relative contribution of each source, we propose modeling the variances of

each source’s coeﬃcients separately such that each source has its own independent prior

placed on its coeﬃcients. That is, let b[m]∼Dmwhere Dmis an arbitrary distribution.

If we let the priors all be mean-zero normal distributions as in the previous section, that

gives us

b[m]∼MV N (0, τmI).(1)

This allows us to distinguish the level of the contribution for coeﬃcients from diﬀerent

sources by allowing for diﬀerent source variances τm.

2.2.3 Multi-way model

Rank 1 model

We now consider the case in which the data are multi-way (D > 1) but single source P= 1,

and thus Xis N×P×D, a 3-way tensor. We propose the following bilinear model, which

5

is analogous to the one proposed by Lyu and others (2017) in the context of modeling the

coeﬃcients for multi-way DWD.

We assume the covariate matrix B:P×Dhas rank-1 decomposition

B=wvT(2)

where w= [w1, ..., wP]Tand v= [v1, ..., vD]T.

Thus our model for each i∈1, ..., N is

yi=Xi·B=vXT

iw(3)

where vis 1 ×D,Xiis P×D, and wis P×1.

To interpret this model in the context of our motivating example, we may consider w

to represent a pattern in the metabolites that is predictive of ywhile vgives the relative

contribution at each time point.

Rank R model

In the previous model (2), we assumed the covariate matrix Bhad a rank-1 decomposition,

that is, the outcome is determined by combining a single pattern in each dimension of the

coeﬃcient matrix. However, it is possible that multiple patterns contribute to the outcome.

For example, it may be that some metabolites are predictive of the outcome at an early

time point but others are only predictive of the outcome at a later time point. Consider a

new data structure where we assume the covariate matrix Bhas rank-R decomposition:

B=WVT(4)

where W:P×Rwith columns wr= [wr1, ..., wrp]Tand V:D×Rwith columns vr=

[vr1, ..., vrd]T, for r= 1, ..., R,R < min(P, D). Observe that the coeﬃcient matrix Bin

the rank-1 multi-way model (2) is a special case of the rank-R multi-way model (4) when

R= 1.

This use of low-rank structure on the coeﬃcients allows us to capture multi-way depen-

dence and identify relevant patterns in each dimension of the coeﬃcient matrix.

2.2.4 Multisource and multi-way model

We now combine aspects of the multi-source and multi-way models into a single Bayesian

linear model to address the general MSMW framework introduced in Section 2.1. We

again assume the covariate matrix Bhas rank-R decomposition as in (4) where W=

[W[1], ..., W[M]] and each W[m]has columns w[m]

r= [wr1, ..., wrp]Tfor m= 1, ..., M and V

has columns vr= [vr1, ..., vrd]Tfor r= 1, ..., R,R < min(P, D).

2.2.5 Binary outcome

We now consider our model in the case where yis binary, i.e. yi∈ {0,1}for i= 1, ..., N .

We can accommodate such data by modifying our approach to use a latent variable probit

6

model, similar to that described in Albert and Chib (1993). Suppose there exists an

auxiliary random variable zisuch that zi=Xi·B+i, where i∼Normal(0,1). We can

model our outcome variable yias an indicator for whether or not this latent variable is

positive, that is, yi= 1 if zi>0 and yi= 0 otherwise. This is equivalent to using a probit

link function:

P r(yi= 1|Xi) = P r(zi>0|Xi) = P r (Xi·B+i>0) = P r(i<Xi·B) = Φ(Xi·B)

where Φ(x) = 1

√2πRx

−∞ e

−t2

2dt.

2.3 Model estimation

2.3.1 Priors

As referenced in Section 2.2.1, we can model the coeﬃcients in a single-source non-multi-

way model to come from a normal distribution with mean zero and variance τ. We extend

this approach to our MSMW model by using a mean-zero normal prior to estimate the

components in our covariate matrix B.

In order to accommodate our multi-way model, we do not estimate Bdirectly, but

instead estimate the components of our covariate matrix, either wand v, as outlined in

(2) for the rank-1 model or Wand Vas outlined in (4) for the rank-R model. We then

place mean-zero normal priors on each Wand V, that is, the entries of each W[m]are

independent with a Normal(0, τm) distribution, and the entries of Vare independent with

a Normal(0,1) distribution. We ﬁx the variance of Vbecause Bis the product of Vand W,

and thus their respective scales are not identiﬁable and only the variance of Wneeds to be

modeled. This further allows us to model the variance of the contribution for each source

separately by considering each τm|W[m]for m= 1, ..., M. In order to facilitate an eﬃcient

sampling algorithm for the posterior distribution, we place conjugate inverse-gamma priors

on the variance parameters, τm∼IG(α0, β0).

We assume that the error terms eare independent and normally distributed, e∼

MV N (0,Iσ2). For a continuous y,σ2may either be ﬁxed or given a prior. If σ2is

unknown, by default we use an inverse-gamma prior distribution with arbitrarily small

hyperparameters as a non-informative prior, e.g., σ2∼IG(0.001,0.001). If yis binary,

then the error variance for the latent continuous variables zin Section 2.2.5 is ﬁxed at

σ2= 1.

2.3.2 Full conditional distributions

Given the conjugate hyperpriors we’ve placed on τ, and ﬁxing the variance of our error

terms at 1, we will have the following conditional distribution for each τm:

τm|W∼IG α0+P[m]

2, β0+1

2||W[m]||2

F.(5)

For our coeﬃcient factor parameters, Wand V, standard linear model results with

conjugate normal priors (Lindley and Smith, 1972) produce:

vec(W)|y,τ,V, σ2∼MV N ((T−1σ2+XT

vXv)−1(XT

vy), σ2(T−1σ2+XT

vXv)−1) (6)

7

where T:RP ×RP is the diagonal prior covariance matrix with diagonal entries

[τ1. . . τ1

| {z }

P[1]

τ2. . . τ2

| {z }

P[2]

. . . τM. . . τM

| {z }

P[M]

]

repeated Rtimes, and Xv:N×RP is the matrix with row igiven by vec(XiV). Similarly,

vec(V)|y,W, σ2∼MV N ((Iσ2+XT

wXw)−1(XT

wy), σ2(Iσ2+XT

vXv)−1) (7)

where Xw:N×RD is the matrix with row igiven by vec(WXi).

Use of the non-informative conjugate prior (σ2∼IG(0.001,0.001)) yields the following

full conditional distribution for σ2:

σ2|y,B∼IG N

2+ 0.001,(y−XB)T(y−XB)

2+ 0.001.(8)

2.3.3 Data augmentation of binary case

Under the latent variable formulation for binary data described in Section 2.2.5, the full

conditional distributions for Wand Vare analogous to that in are analogous to that in

(6) and (7), respectively, but with zreplacing yand σ2= 1.

As a consequence of our latent variable modeling, the conditional distribution of zwill

be a truncated normal distribution, denoted as Ntrunc , as follows:

zi|y,B∼Ntrunc(Xi·B,1) (9)

where the distribution is truncated at the right by 0 if yi= 0 and truncated at the left by

0 if yi= 1.

2.3.4 Gibbs sampling algorithm for continuous case

We approximate our posterior using a Gibbs sampling algorithm. Here we provide the

algorithm used for the continuous version of the MSMW model to draw samples from the

joint posterior distribution p(W,V,τ, σ2|X,y). The algorithm is given below for iterations

t= 1, ..., T :

1. Initialize W(1), σ(1), τ (1)

1, ..., τ (1)

M

2. Make the following draws for 2, ..., T

•Draw V(t)|y, σ(t−1),W(t−1) as in (7)

•Draw W(t)|y, σ(t−1), τ (t−1)

1, τ (t−1)

2,V(t)as in (6)

•Draw τ(t)

1, ..., τ (t)

M|W[1](t), ..., W[M](t)as in (5)

•Calculate B(t)=W(t)V(t)T

•Draw σ(t)|B,yas in (8) (if σ2is not ﬁxed).

8

2.3.5 Gibbs sampling algorithm for binary case

In order to accommodate binary data in our Gibbs sampler, we must introduce our data

augmentation steps, in which we draw the latent continuous variables zi:

1. Initialize V(1),W(1),z(1), τ (1)

1, ..., τ (1)

M

2. Make the following draws for 2, ..., T

•Draw V(t)|z,W(t−1) as in (7), with zreplacing yand σ2= 1.

•Draw W(t)|y,z(t),τ(t−1),V(t)as in (6), with zreplacing yand σ2= 1.

•Draw τ(t)

1, ..., τ (t)

M|W[1](t), ..., W[M](t)as in (5)

•Calculate B(t)=W(t)V(t)T.

•Draw z|y,B(t)as in (9).

2.3.6 Model Prediction

After running our Gibbs sampler to simulate draws from our posterior, we take the average

over sampling iterations B(1), ..., B(T)to obtain estimated coeﬃcients ˆ

B. Given new data

X∗for N∗samples, we can then obtain a point estimate the outcomes y∗via ˆy∗

l=X∗

l·ˆ

B.

For a binary outcome, Φ(X∗

l·ˆ

B) gives an estimate of the predicted probability of having an

outcome value of 1, and to translate this probability into a class prediction, we can simply

round the value to the nearest integer:

ˆy∗

l=(1 if Φ(X∗

l·ˆ

B)≥0.5

0 if Φ(X∗

l·ˆ

B)<0.5(10)

for l= 1, ..., N ∗.

Alternatively, the Bayesian approach allows one to model the full posterior predictive

distribution with uncertainty. For the continuous case, draws y∗(t)

lfrom the posterior

predictive distribution can be obtained from the Gibbs draws via y∗(t)

l∼Normal(X∗

l·

B(t), σ2(t)) for l= 1, ..., N ∗. In the binary case, draws from the posterior predictive can be

generated via y∗(t)

l∼Bernoulli(Φ(X∗

l·B(t))) for l= 1, ..., N ∗.

3 Results

3.1 Simulations

3.1.1 Data generation

We generated data under multiple scenarios to illustrate the relative beneﬁts of incorpo-

rating multi-source or multi-way structure under diﬀerent conditions. For all scenarios, we

simulated data sets X[1] :N×P1×Dand X[2] :N×P2×D, representing data from two

sources with Nobservations, P1and P2covariates from each source with P1=P2=P/2,

9

and Dtime points. We consider a low-dimensional scenario with N= 100 and P= 6 and

D= 5, and a high-dimensional scenario with N= 20 and P= 200 and D= 2 (closely

matching the application in Section 3.3). We generated the true coeﬃcient array Bunder

one setting for which the sources contribute equally (τ1=τ2= 1) and on setting for which

only one source contributes (τ1= 0, τ2= 1). We also consider settings under which Bhas

a rank 1 or rank 2 decomposition, or where the coeﬃcient matrix has no multiway struc-

ture (i.e., Bhas independent entries and is of full rank). In the non-multiway case, the

entries of the coeﬃcients for each source B[m]:P[m]×Dare generated independently from

a Normal(0, τm) distribution. In the multi-way case for rank R= 1 or R= 2, we generate

W[m]:P[m]×Rand V:D×Rby simulating the entries of each W[m]

jindependently from

Normal(0, τm) for m= 1,2 and the entries of Vindependently from Normal(0,1). A rank

2 model was not considered for the high-dimensional case, because the full rank scenario is

already of rank 2 (D= 2).

Continuous outcome

For the continuous case, the entries of X[1] and X[2] were each generated independently

from a Normal(0,1) distribution. Then, after generating B, the response variables ywas

generated via yi∼Normal(Xi·B,1).

Binary outcome

For our ﬁrst binary data generating procedure, similarly to the continuous case, the entries

of X[1] and X[2] were each generated independently from a Normal(0,1) distribution. Then,

after generating B, the response variables ywas generated via using the probit link function

yi∼Bernoulli(Φ(Xi·B)).

Separate normal distributions

We considered a third case for which the outcome is binary and the distribution of Xi

depends on the outcome. Here, the outcome was generated deterministically, with half

of the Nobservations having value 0 and half having value 1: yi= 0 for i= 1, . . . , N/2

and yi= 1 for i=N/2 + 1, . . . , N . The coeﬃcients Bare generated under the same

conditions above, and then Xis generated via Xi=−B+Eiif yi= 0 and Xi=B+Ei

if yi= 1, with the entries of Eigenerated independently from a Normal(0,1) distribution

for i= 1, . . . , N. Note that this scenario does not explicitly match the assumptions of

our probit model, however, it approximates a realistic scenario for which the data have

diﬀerent means depending on their class, which is detectable in the high dimensional case.

The estimated coeﬃcients for the optimal linear classiﬁer will be proportional to B.

3.1.2 Measures of performance considered

We assess predictive performance by applying our model to test data that were generated

from the same distributions as the training data with a larger sample size (N∗= 500). For

our simulations with a binary outcome, we used the prediction method outlined in (10)

10

and compare the predicted classiﬁcation to the true classiﬁcation to get a misclassiﬁcation

rate. For our simulations with a continuous outcome, we compare the predicted outcome

with the true outcome by calculating the relative mean squared error ||y−ˆ

y||2

F/||y||2

F.

For all simulations, we also assess recovery of the underlying parameters by considering

the posterior coverage rates of the true parameters and also the correlation between the

estimated estimated and the true coeﬃcients.

3.1.3 Models used for estimation

For each simulation condition, we ran a total of six diﬀerent models that each made diﬀerent

assumptions about the underlying structure of B.

For non-multiway models, the data were assumed not to follow a multi-way structure

and the data arrays were reorganized into a matrix of dimension N×P D where the ith

row gives vec(Xi). We also ran two models that did assume a multi-way structure; the ﬁrst

of these models imposed the assumption of a rank 1 covariate coeﬃcient matrix structure

(as in Equation 2) and the second of these models imposed the assumption of a rank 2

covariate coeﬃcient matrix structure (as in Equation 4 for R= 2).

For the multi-source models, the covariates were assumed to come from two sources,

with half of the covariates from one source and half from the other. This means that two

independent priors on the covariate coeﬃcients were ﬁt as in (1) for m= 1,2. For the single-

source models, the data were assumed to come from one source (i.e., distinction between

X1and X2were ignored) and only one prior was placed on the covariate coeﬃcients.

Taking all combinations of these models produces the following six that were ﬁt in our

simulations:

1. Rank 2, Multi-source model (Rank2,MS) with τm∼IG(1,√P[m]∗R) for m= 1,2

2. Rank 2, Single-source model (Rank2,SS) with τ∼IG(1,√P∗R)

3. Rank 1, Multi-source model (Rank1,MS), with τm∼IG(1,√P[m]∗R) for m= 1,2

4. Rank 1, Single-source model (Rank1,SS) with τ∼IG(1,√P)

5. Non-multi-way, Multi-source model (FullRank,MS) with τ IG(1,√P[m]∗d) for m=

1,2

6. Non-multi-way, Single-source model (FullRank,SS), with τ∼IG(1,√P∗d).

3.2 Simulation results

The following tables show the results of the metrics outlined in 3.1.2 for all of our sim-

ulations, averagedd over 100 replications for each condition. For these tables, “MS” is

an abbreviation for “multi-source” and is used to indicate either a multi-source model or

a simulation condition in which the signal was equal across both sources (as opposed to

being entirely conﬁned to a single source). “Rank” in these tables refers to the rank of the

true coeﬃcient matrix, with “FullRank” referring to the simulation condition in which the

11

true coeﬃcient matrix is generated without any multiway structure.For all tables, bolded

values indicate the best performing model based on a pairwise t-test. If multiple values are

bolded, then model performances were not signiﬁcantly diﬀerent at a 0.05 level.

Binary outcome, low dimensions (N=100, P1=3, P2=3, d=5)

Probit generated data

Tables 1 and 2 give the misclassiﬁcation rate and correlation with the true discriminating

signal, respectively, for the low dimensional simulation with probit generated data. In

all cases, the model that best matched the data generation scenario performed the best

for both measures. The beneﬁts of using the correct multiway structure (Rank 2, 1 or

full rank) tended to be more dramatic than that for matching the multi-source structure;

the relative diﬀerences are particularly large for the correlations with the true coeﬃcients

shown in Table 2. There were a few cases in which a model did not match the true data

generation but performance was not statistically diﬀerent from the model that did match

the true data generation; all such cases involved a multi-source model performing on-par

with a single-source model when the true data were single-source.

Separate normal data

Tables 3 and 4 give the misclassiﬁcation rate and correlation with the true discriminating

signal, respectively, for the low dimensional simulation with data generated from separate

normal distributions. In all cases, the model that best matched the data generation scenario

performed the best for both measures. The beneﬁts of using the correct multiway structure

tended to be more dramatic than that for matching the multi-source structure; the relative

diﬀerences are particularly large for the correlations shown in Table 4.

Continuous outcome, low dimension (N=100, P1=3, P2=3, d=5)

Tables 5 and 6 give the relative squared error and correlation with the true discriminating

signal, respectively, for the low dimensional simulation for data generated with a continuous

outcome. In general, the model that best matched the data generation scenario performed

the best for both measures, though some cases saw models that matched the true data

structure failing to outperform models that did not. Interestingly, the rank 2 models

closely match the performance of the rank 1 model (even if it is misspeciﬁed) but the

full rank model performs much worse under low rank structure. The beneﬁts of using the

correct multiway structure (Rank 2, 1, or full rank) tended to be more dramatic than that

for matching the multi-source structure; the relative diﬀerences are particularly large for

the correlations shown in Table 4.

12

Binary outcome, high dimension (N=20, P1=100, P2=100, d=2)

Probit generated data

Tables 7 and 8 give the misclassiﬁcation rate and correlation with the true discriminating

signal, respectively, for the high dimensional simulation with probit generated data. In all

cases, the model that best matched the data generation scenario performed the best for

both measures, though the performance was not always statistically signiﬁcant in outper-

forming other models. In particular, the misclassiﬁcation rates observed for all models were

close to 0.5, indicating performance only marginally better than random guessing. This

demonstrates the challenge in ﬁtting predictive models to HDLSS data when the distribu-

tion of the data does not depend on the outcome. The correlation results were also not

very strong, though they do more clearly indicate better performance from the models that

match the true data generation.

Normal generated data

Tables 9 and 10 give the misclassiﬁcation rate and correlation with the true discriminating

signal, respectively, for the high-dimensional simulation with data generated from separate

normal distributions. In general the misclassiﬁcation rates are much better here than they

are in the high-dimensional probit scenario. In all cases, the model that best matched the

data generation scenario performed the best for both measures. The beneﬁts of using the

correct multiway structure (Rank 1 or full rank) tended to be more dramatic than that for

matching the multi-source structure; the relative diﬀerences are particularly large for the

correlations shown in Table 10.

Continuous outcome, high dimension (N=20, P1=100, P2=100, d=2)

Tables 11 and 12 give the relative squared error and correlation with the true discriminating

signal, respectively, for the high dimensional simulation with probit generated data. In all

cases, the model that best matched the data generation scenario performed the best for

both measures, though the performance was not statistically signiﬁcant in outperforming

other models. In particular, the relative squared errors observed for all models were rather

close to 1, indicating performance that is only marginally beneﬁcial. The correlation results

were also not very strong, though they more clearly indicate better performance from the

models that match the true data generation.

3.3 Application to multi-omic iron deﬁciency

We applied our method of multi-source, multi-way Bayesian probit regression to our moti-

vating data on iron deﬁciency in an infant rhesus monkey model and assessed our ability to

discriminate between ID and iron suﬃcient (IS) infants based on the serum proteomic and

metabolomic proﬁles measured at two time points (4 and 6 months after birth). In this

model, infants destined to develop ID show evidence of ID (changes in serum iron indices

and lower reticulocyte hemoglobin content) at 4 months, with iron deﬁciency anemia (lower

hemoglobin and mean corpuscular volume) seen at 6 months (Lubach and Coe, 2006; Coe

13

and others, 2013; Rao and others, 2018; Sandri and others, 2020, 2021, 2022). Proteomic

and metabolomic changes in serum are seen in the preanemic and anemic periods (Sandri

and others, 2022). After routine pre-processing data were available for 227 metabolites and

205 proteins for 6 ID and 6 IS monkeys. We used the relatively non-informative prior to

infer the variances of the proteomic and metabolomic coeﬃcients, τ2

1and τ2

2: an Inverse

Gamma distribution with parameters α= 1, β = 0.1.

We assessed the estimated probabilities of class membership under leave-one-out cross

validation (LOOCV) using the rank-1 model, for which the posterior predictive probability

for a held-out infant is inferred given the remaining N−1 infants. The plot of these

probabilities demonstrated our model’s ability to achieve perfect separation between the

ID and IS samples in the estimated class probabilities (Figure 1). We also examined the

loadings for the individual proteins, individual metabolites, and each time point (Figure 2).

The proteomic and metabolomic loadings both show several biomarkers that are positively

and negatively associated with ID; moreover, the loadings have similar scales between the

two data sources, with τ1= 0.200 (proteomics) and τ2= 0.238 (metabolomics) indicating

that the signal discriminating ID from IS infants is of similar size.

To assess potential beneﬁts of our approach, we compared the t-statistic for the diﬀer-

ence in probit scores between the IS and ID groups under LOOCV to analogous approaches

that do not account for multi-source or multi-way structure. Table 13 shows the resulting

values for a multi-way (i.e., rank 1) or non multi-way (i.e., full rank) model using (1) only

the metabolite data, (2) only the proteomic data, or (3) both data sources. In all cases

the multi-way approach performs better, suggesting that the metabolomic and proteeomic

proﬁles discrimninating ID from IS infants are similar at 4-months and 6-months, and

we can improve power by accounting for this structure. Moreover, the chosen multi-way

model with both sources outperforms others with a t-statistic of 7.235, suggesting that

the metabolites and proteins have complementary information and we can improve perfor-

mance by combining them in a single model. Moreover, an analogous approach that did

not model the source variances separately acheived a small t-statistic of 5.521, suggesting

an advantage to accounting for heterogeneity between the sources.

4 Discussion

We have proposed a Bayesian linear model that can predict a binary or a continuous

outcome using data that are both multi-source and multi-way, with any number of sources

or dimensions. Both the simulation and data analysis results have shown that the proposed

MSMW model can improve classiﬁcation accuracy and reduce MSE when the underlying

data have MSMW structure. However, the performance of any given approach depends on

the conditions that the data were generated, such as the true rank of the underlying signal or

whether diﬀerent sources have diﬀerent signal variances. Thus, practical data applications

of this model may require applying diﬀerent versions of the method and comparing their

performance. In this article we have focused on three-way arrays (N×P×D), however,

extensions to high-order arrays are straightforward, for which the coeﬃcients array will

take the form of a CP decomposition (Zhou and others, 2013; Guo and others, 2022).

14

5 Software

Software in the form of R code, together with a sample input data set and complete

documentation is available at https://github.com/BiostatsKim/BayesMSMW.

Acknowledgments

This work was supported by the National Institute of General Medical Sciences (NIGMS)

grant R01-GM130622. Funding for the data application in Section 3.3 was also provided

by grants from the National Institute of Health/Eunice Kennedy Shriver National Institute

of Child Health and Development [HD089989, HD080201, HD057064 and HD39386].

References

Albert, James H and Chib, Siddhartha. (1993). Bayesian analysis of binary and

polychotomous response data. Journal of the American statistical Association 88(422),

669–679.

Coe, Christopher L, Lubach, Gabriele R, Bianco, Laura and Beard, John L.

(2009). A history of iron deﬁciency anemia during infancy alters brain monoamine

activity later in juvenile monkeys. Developmental Psychobiology: The Journal of the

International Society for Developmental Psychobiology 51(3), 301–309.

Coe, Christopher L, Lubach, Gabriele R, Busbridge, Mark and Chapman,

Richard S. (2013). Optimal iron fortiﬁcation of maternal diet during pregnancy and

nursing for investigating and preventing iron deﬁciency in young rhesus monkeys. Re-

search in veterinary science 94(3), 549–554.

Geguchadze, Ramaz N, Coe, Christopher L, Lubach, Gabriele R, Clardy,

Thomas W, Beard, John L and Connor, James R. (2008). Csf proteomic anal-

ysis reveals persistent iron deﬁciency-induced alterations in non-human primate infants.

Journal of neurochemistry 105(1), 127–136.

Gloaguen, Arnaud, Philippe, Cathy, Frouin, Vincent, Gennari, Giulia,

Dehaene-Lambertz, Ghislaine, Le Brusquet, Laurent and Tenenhaus,

Arthur. (2022). Multiway generalized canonical correlation analysis. Biostatis-

tics 23(1), 240–256.

Guhaniyogi, Rajarshi, Qamar, Shaan and Dunson, David B. (2017). Bayesian

tensor regression. The Journal of Machine Learning Research 18(1), 2733–2763.

Guo, Bin, Eberly, Lynn E, Henry, Pierre-Gilles, Lenglet, Christophe and

Lock, Eric F. (2022). Multiway sparse distance weighted discrimination. Journal of

Computational and Graphical Statistics (just-accepted), 1–43.

15

Huopaniemi, Ilkka, Suvitaival, Tommi, Nikkila, Janne, Oresic, Matej and

Kaski, Samuel. (2010). Multivariate multi-way analysis of multi-source data. Bioin-

formatics 26(12), i391–i398.

Kolda, Tamara G and Bader, Brett W. (2009). Tensor decompositions and appli-

cations. SIAM review 51(3), 455–500.

Li, Xiaoshan, Xu, Da, Zhou, Hua and Li, Lexin. (2018). Tucker tensor regression

and neuroimaging analysis. Statistics in Biosciences 10(3), 520–545.

Lindley, Dennis V and Smith, Adrian FM. (1972). Bayes estimates for the linear

model. Journal of the Royal Statistical Society: Series B (Methodological) 34(1), 1–18.

Lubach, Gabriele R and Coe, Christopher L. (2006). Preconception maternal iron

status is a risk factor for iron deﬁciency in infant rhesus monkeys (macaca mulatta). The

Journal of nutrition 136(9), 2345–2349.

Lyu, Tianmeng, Lock, Eric F and Eberly, Lynn E. (2017). Discriminating sample

groups with multi-way data. Biostatistics 18(3), 434–450.

Miranda, Michelle F, Zhu, Hongtu, Ibrahim, Joseph G, Initiative,

Alzheimer’s Disease Neuroimaging and others. (2018). Tprm: Tensor partition

regression models with applications in imaging biomarker detection. The annals of ap-

plied statistics 12(3), 1422.

Palzer, Elise F, Wendt, Christine H, Bowler, Russell P, Hersh, Craig P,

Safo, Sandra E and Lock, Eric F. (2022). sjive: Supervised joint and individual

variation explained. Computational Statistics & Data Analysis 175, 107547.

Patton, Stephanie M, Coe, Christopher L, Lubach, Gabriele R and Connor,

James R. (2012). Quantitative proteomic analyses of cerebrospinal ﬂuid using itraq in

a primate model of iron deﬁciency anemia. Developmental neuroscience 34(4), 354–365.

Rao, Raghavendra, Ennis, Kathleen, Lubach, Gabriele R, Lock, Eric F,

Georgieff, Michael K and Coe, Christopher L. (2018). Metabolomic analysis

of csf indicates brain metabolic impairment precedes hematological indices of anemia in

the iron-deﬁcient infant monkey. Nutritional neuroscience 21(1), 40–48.

Rao, Raghavendra, Ennis, Kathleen, Oz, Gulin, Lubach, Gabriele R,

Georgieff, Michael K and Coe, Christopher L. (2013). Metabolomic analysis

of cerebrospinal ﬂuid indicates iron deﬁciency compromises cerebral energy metabolism

in the infant monkey. Neurochemical research 38(3), 573–580.

Rodosthenous, Theodoulos, Shahrezaei, Vahid and Evangelou, Marina.

(2020). Integrating multi-omics data through sparse canonical correlation analysis for

the prediction of complex traits: a comparison study. Bioinformatics 36(17), 4616–4625.

16

Sandri, Brian J, Kim, Jonathan, Lubach, Gabriele R, Lock, Eric F, Guer-

rero, Candace, Higgins, LeeAnn, Markowski, Todd W, Kling, Pamela J,

Georgieff, Michael K, Coe, Christopher L and others. (2022). Multiomic pro-

ﬁling of iron-deﬁcient infant monkeys reveals alterations in neurologically important bio-

chemicals in serum and cerebrospinal ﬂuid before the onset of anemia. American Journal

of Physiology-Regulatory, Integrative and Comparative Physiology 322(6), R486–R500.

Sandri, Brian J, Lubach, Gabriele R, Lock, Eric F, Georgieff, Michael K,

Kling, Pamela J, Coe, Christopher L and Rao, Raghavendra B. (2020).

Early-life iron deﬁciency and its natural resolution are associated with altered serum

metabolomic proﬁles in infant rhesus monkeys. The Journal of nutrition 150(4), 685–

693.

Sandri, Brian J, Lubach, Gabriele R, Lock, Eric F, Kling, Pamela J, Georgi-

eff, Michael K, Coe, Christopher L and Rao, Raghavendra B. (2021). Cor-

recting iron deﬁciency anemia with iron dextran alters the serum metabolomic proﬁle of

the infant rhesus monkey. The American Journal of Clinical Nutrition 113(4), 915–923.

Singh, Amrit, Shannon, Casey P, Gautier, Beno

ˆ

ıt, Rohart, Florian, Vacher,

Micha¨

el, Tebbutt, Scott J and Lˆ

e Cao, Kim-Anh. (2019). Diablo: an integra-

tive approach for identifying key molecular drivers from multi-omics assays. Bioinfor-

matics 35(17), 3055–3062.

Tao, Dacheng, Li, Xuelong, Hu, Weiming, Maybank, Stephen and Wu, Xin-

dong. (2005). Supervised tensor learning. In: Fifth IEEE International Conference on

Data Mining (ICDM’05). IEEE. pp. 8–pp.

Van De Wiel, Mark A, Lien, Tonje G, Verlaat, Wina, van Wieringen, Wes-

sel N and Wilting, Saskia M. (2016). Better prediction by use of co-data: adaptive

group-regularized ridge regression. Statistics in Medicine 35(3), 368–381.

White, Brian S, Khan, Suleiman A, Mason, Mike J, Ammad-Ud-Din, Muham-

mad, Potdar, Swapnil, Malani, Disha, Kuusanm¨

aki, Heikki, Druker,

Brian J, Heckman, Caroline, Kallioniemi, Olli and others. (2021). Bayesian

multi-source regression and monocyte-associated gene expression predict bcl-2 inhibitor

resistance in acute myeloid leukemia. NPJ precision oncology 5(1), 1–11.

Zhang, Yunfeng and Gaynanova, Irina. (2021). Joint association and classiﬁcation

analysis of multi-view data. Biometrics.

Zhou, Hua, Li, Lexin and Zhu, Hongtu. (2013). Tensor regression with applications

in neuroimaging data analysis. Journal of the American Statistical Association 108(502),

540–552.

17

Misclassiﬁcation: low-dimensional probit

Rank: 2 Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No

Rank2,MS 0.214 0.217 0.264 0.229 0.194 0.222

Rank2,SS 0.216 0.216 0.264 0.227 0.198 0.221

Rank1,MS 0.225 0.244 0.251 0.210 0.249 0.285

Rank1,SS 0.224 0.243 0.252 0.210 0.250 0.284

FullRank,MS 0.233 0.241 0.293 0.266 0.174 0.170

FullRank,SS 0.248 0.241 0.300 0.265 0.196 0.168

Table 1: Test misclassiﬁcation rate for low-dimensional probit scenario.

Correlations: low-dimensional probit

Rank: 2 Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No

Rank2,MS 0.750 0.739 0.620 0.710 0.131 0.135

Rank2,SS 0.748 0.740 0.620 0.713 0.130 0.136

Rank1,MS 0.730 0.690 0.642 0.737 0.117 0.094

Rank1,SS 0.730 0.691 0.639 0.737 0.116 0.095

FullRank,MS 0.096 0.109 0.085 0.106 0.837 0.857

FullRank,SS 0.094 0.110 0.083 0.109 0.805 0.859

Table 2: Correlation with true coeﬃcients for the low-dimensional probit scenario.

18

Misclassiﬁcation: low-dimensional separate normal

Rank: 2 Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No

Rank2,MS 0.111 0.034 0.246 0.134 0.176 0.118

Rank2,SS 0.111 0.033 0.245 0.132 0.175 0.115

Rank1,MS 0.117 0.043 0.240 0.129 0.200 0.155

Rank1,SS 0.116 0.042 0.241 0.128 0.200 0.154

FullRank,MS 0.115 0.036 0.258 0.146 0.168 0.092

FullRank,SS 0.117 0.035 0.261 0.145 0.174 0.091

Table 3: Test misclassiﬁcation rate for the low-dimensional separate normal scenario.

Correlations: low-dimensional separate normal

Rank: 2 Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No

Rank2,MS 0.876 0.887 0.764 0.877 0.164 0.091

Rank2,SS 0.873 0.897 0.770 0.885 0.170 0.098

Rank1,MS 0.846 0.844 0.796 0.901 0.167 0.087

Rank1,SS 0.848 0.845 0.796 0.905 0.166 0.090

FullRank,MS 0.196 0.155 0.111 0.125 0.832 0.845

FullRank,SS 0.192 0.160 0.106 0.125 0.814 0.853

Table 4: Correlation with true coeﬃcients for the low-dimensional separate normal scenario.

Relative Squared Error: low-dimensional continuous

Rank: 2 Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No

Rank2,MS 0.742 0.446 0.719 0.616 0.500 0.468

Rank2,SS 0.740 0.444 0.716 0.613 0.501 0.464

Rank1,MS 0.758 0.550 0.700 0.602 0.654 0.664

Rank1,SS 0.757 0.549 0.700 0.601 0.654 0.661

FullRank,MS 0.811 0.487 0.801 0.689 0.485 0.287

FullRank,SS 0.784 0.473 0.777 0.668 0.485 0.286

Table 5: Mean relative squared prediction error on test data for the low-dimensional con-

tinuous scenario.

19

Correlations: low-dimensional continuous

Rank: 2 Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No MS: Yes MS: No

Rank2,MS 0.703 0.873 0.751 0.810 0.222 0.099

Rank2,SS 0.702 0.874 0.751 0.811 0.222 0.101

Rank1,MS 0.667 0.792 0.774 0.819 0.191 0.092

Rank1,SS 0.666 0.792 0.772 0.820 0.192 0.079

FullRank,MS 0.137 0.153 0.062 0.141 0.882 0.943

FullRank,SS 0.136 0.152 0.060 0.142 0.874 0.944

Table 6: Correlation with true coeﬃcients for the low-dimensional continuous scenario.

Misclassiﬁcation: high-dimensional probit

Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No

Rank1,MS 0.444 0.453 0.451 0.448

Rank1,SS 0.448 0.454 0.454 0.447

FullRank,MS 0.446 0.454 0.448 0.444

FullRank,SS 0.449 0.452 0.453 0.443

Table 7: Test misclassiﬁcation rates for the high-dimensional probit scenario.

Correlations: high-dimensional probit

Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No

Rank1,MS 0.142 0.134 0.081 0.068

Rank1,SS 0.147 0.148 0.079 0.070

FullRank,MS 0.084 0.080 0.181 0.163

FullRank,SS 0.080 0.082 0.172 0.172

Table 8: Correlation with true coeﬃcients for the high-dimensional probit scenario.

Misclassiﬁcation: high-dimensional separate normal

Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No

Rank1,MS 0.203 0.178 0.172 0.068

Rank1,SS 0.213 0.178 0.199 0.061

FullRank,MS 0.215 0.198 0.157 0.054

FullRank,SS 0.228 0.196 0.188 0.052

Table 9: Test misclassiﬁcation rates for the high-dimensional separate normal scenario.

20

Correlation: high-dimensional separate normal data

Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No

Rank1,MS 0.468 0.479 0.202 0.251

Rank1,SS 0.423 0.492 0.187 0.253

FullRank,MS 0.226 0.200 0.487 0.534

FullRank,SS 0.200 0.205 0.421 0.546

Table 10: Correlation with true coeﬃcients for the high-dimensional separate normal sce-

nario.

Relative squared error: high-dimensional continuous

Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No

Rank1,MS 0.972 0.965 0.957 0.984

Rank1,SS 0.977 0.969 0.964 0.975

FullRank,MS 0.984 0.972 0.952 0.963

FullRank,SS 0.983 0.968 0.955 0.958

Table 11: Mean relative squared prediction error on test data for the high-dimensional

continuous scenario.

Correlation Results: Continuous data

Rank: 1 Full rank

Model MS: Yes MS: No MS: Yes MS: No

Rank1,MS 0.179 0.185 0.096 0.091

Rank1,SS 0.169 0.180 0.090 0.099

FullRank,MS 0.072 0.100 0.223 0.211

FullRank,SS 0.070 0.102 0.215 0.218

Table 12: Correlation with true coeﬃcients for the high-dimensional continuous scenario.

t-test statistics

Multi-way Non-multi-way

Metabolites only 5.773 4.407

Proteins only 5.182 3.935

All data 7.235 4.741

Table 13: Test statistics from using two-sample t tests to evaluate separation between ID

and IS samples achieved by diﬀerent models under LOOCV.

21

Bayesian MSMW model for ID data

Probit scores

Density

0

1

2

3

4

5

−0.5 0.0 0.5

Status

ID

IS

Figure 1: Probit scores from applying MSMW model to motivating data under LOOCV.

22

0.0

0.1

0.2

0.3

0.4

4 month 6 month

Time

Loading

Time Loadings

−0.050

−0.025

0.000

0.025

0.050

0

50

100

150

200

Protein index number

Loading

Protein Loadings

−0.050

−0.025

0.000

0.025

0

50

100

150

200

250

Metabolite index number

Loading

Metabolite Loadings

Figure 2: Factor loadings from applying MSMW model to motivating data.

23