PreprintPDF Available

Identifiability and Consistent Estimation of the Gaussian Chain Graph Model

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The chain graph model admits both undirected and directed edges in one graph, where symmetric conditional dependencies are encoded via undirected edges and asymmetric causal relations are encoded via directed edges. Though frequently encountered in practice, the chain graph model has been largely under investigated in literature, possibly due to the lack of identifiability conditions between undirected and directed edges. In this paper, we first establish a set of novel identifiability conditions for the Gaussian chain graph model, exploiting a low rank plus sparse decomposition of the precision matrix. Further, an efficient learning algorithm is built upon the identifiability conditions to fully recover the chain graph structure. Theoretical analysis on the proposed method is conducted, assuring its asymptotic consistency in recovering the exact chain graph structure. The advantage of the proposed method is also supported by numerical experiments on both simulated examples and a real application on the Standard & Poor 500 index data.
Content may be subject to copyright.
Identifiability and Consistent Estimation of the Gaussian
Chain Graph Model
Ruixuan Zhao, Haoran Zhangand Junhui Wang
School of Data Science
City University of Hong Kong
Department of Statistics
The Chinese University of Hong Kong
Abstract
The chain graph model admits both undirected and directed edges in one graph, where
symmetric conditional dependencies are encoded via undirected edges and asymmetric causal
relations are encoded via directed edges. Though frequently encountered in practice, the chain
graph model has been largely under investigated in literature, possibly due to the lack of identi-
fiability conditions between undirected and directed edges. In this paper, we first establish a set
of novel identifiability conditions for the Gaussian chain graph model, exploiting a low rank
plus sparse decomposition of the precision matrix. Further, an efficient learning algorithm is
built upon the identifiability conditions to fully recover the chain graph structure. Theoretical
analysis on the proposed method is conducted, assuring its asymptotic consistency in recover-
ing the exact chain graph structure. The advantage of the proposed method is also supported
by numerical experiments on both simulated examples and a real application on the Standard
& Poor 500 index data.
Keywords: Causal inference, tangent space, directed acyclic graph, Gaussian graphical model,
low-rank plus sparse decomposition
1 Introduction
Graphical model has attracted tremendous attention in recent years, which provides an efficient
modeling framework to characterize various relationships among multiple objects of interest. It
1
arXiv:2303.01031v1 [stat.ME] 2 Mar 2023
finds applications in a wide spectrum of scientific domains, ranging from finance[33], information
system [35], genetics [12], neuroscience [8] to public health [22].
In literature, two types of graphical models have been extensively studied. The first type is
the undirected graphical model, which encodes conditional dependences among collected nodes
via undirected edges. Various learning methods have been proposed to reconstruct the undirected
graph, especially under the well-known Gaussian graphical model [12, 3] where conditional de-
pendences are encoded via the zero pattern of the precision matrix. Another well-studied graphical
model is the directed acyclic graphical model, which uses directed edges to represent causal re-
lationships among collected nodes in a directed acyclic graph (DAG). To reconstruct the DAG
structure, linear Gaussian structural equation model (SEM) has been popularly considered in lit-
erature where causal relations are encoded via the sign pattern of the coefficient matrix. Various
identifiability conditions [30, 27] have been established for the linear Gaussian SEM, leading to a
number of DAG learning methods [6, 27].
Another more flexible graphical model, known as the chain graph model, can be traced back
to the early work in [19, 39]. It admits both undirected and directed edges in one graph, where
symmetric conditional dependencies are encoded via undirected edges and asymmetric causal re-
lations are encoded via directed edges. Further, it is often assumed that no semi-directed cycles
are allowed in the chain graph. As a direct consequence, the chain graph model can be seen as a
special DAG model with multiple chain components, where each chain component is a subset of
nodes connected via undirected edges, and directed edges are only allowed across different chain
components.
The chain graph model has been frequently encountered in practice [5, 15], but largely under-
investigated in literature. In fact, the chain graph model may have various interpretations, includ-
ing the Lauritzen–Wermuth–Frydenberg (LWF) interpretation [19, 13], the multivariate regression
(MVR) interpretation [9] and the Andersson-Madigan-Perlman (AMP) interpretation [1]. Each in-
terpretation implies a different independence relationship from the chain graph structure and leads
2
to several structure learning methods, including the IC like algorithm [36], the CKES algorithm
[29] and the decomposition-based algorithm [23] for LWF chain graphs, the PC like algorithm [34]
and the decomposition-based algorithm [16] for MVR chain graphs, and the PC like algorithm [28]
and the decomposition-based algorithm [17] for AMP chain graphs. Yet, all these methods can only
estimate some Markov equivalence classes of the chain graph model, and provide no guarantee for
the reconstruction of the exact chain graph structure, mostly due to the lack of identifiability condi-
tions between undirected and directed edges. It was until very recently that [38] extends the equal
noise variance assumption for DAG [30] to establish the identifiability of the chain graph model
under the AMP interpretation. Yet, the extended identifiability condition in [38] is rather artificial
and difficult to verify in practice. It is also worth mentioning that if the chain components and
their causal ordering are known a priori, then the chain graph model degenerates to a sequence of
multivariate regression models, and various methods [10, 25, 15] have been developed to recover
the graphical structure.
In this paper, we establish a set of novel identifiability conditions for the Gaussian chain graph
model under AMP interpretation, exploiting a low rank plus sparse decomposition of the precision
matrix. Further, an efficient learning algorithm is developed to recover the exact chain graph struc-
ture, including both undirected and directed edges. Specifically, we first reconstruct the undirected
edges by estimating the precision matrix of the noise vector through a regularized likelihood op-
timization. Then, we identify each chain component and determine its causal ordering based on
the conditional variances of its nodes. Finally, the directed edges are reconstructed via multivari-
ate regression coupled with truncated singular value decomposition (SVD). Theoretical analysis
shows that the proposed method consistently reconstructs the exact chain graph structure, which to
the best of our knowledge, is the first asymptotic consistency result for the chain graph model in
literature. The advantage of the proposed method is supported by numerical experiments on both
simulated examples and a real application on the Standard & Poor 500 index data, which reveals
some interesting impacts of the COVID-19 pandemic on the stock market.
3
The rest of the paper is organized as following. Section 2 introduces some preliminaries on
the chain graph model. Section 3 proposes the identifiability conditions for linear Gaussian chain
graph model, and develops an efficient learning algorithm to reconstruct the exact chain graph
structure. The asymptotic consistency of the proposed method are established in Section 4. Nu-
merical experiments of the proposed method on both simulated and real examples are included
in Section 5. Section 6 concludes the paper, and technical proofs are provided in the Appendix.
Auxiliary lemmas and further computational details are deferred to a separate Supplementary File.
Before moving to Section 2, we define some notations. For an integer m, denote [m] =
{1, ..., m}. For a real value x, denote dxeas the largest integer less than or equal to x. For two
nonnegative sequences anand bn, an.bnmeans there exists a constant c > 0such that ancbn
when nis sufficiently large. Further, an.Pbnmeans there exists a constant c > 0such that
Pr(ancbn)1as ngrows to infinity. For a vector x, the sub-vector corresponding to an index
subset Sis denoted as xS= (xi)iS. For a matrix A= (aij)p×p, the sub-matrix corresponding
to rows in S1and columns in S2is denoted as AS1,S2= (aij)iS1,j S2, and let A1
S1,S2denote the
corresponding sub-matrix of A1.Also, let kAk1,off =Pi6=j|aij|,kAkmax = maxij |aij|,kAk2
denote the spectrum norm, kAkdenote the nuclear norm and v(A)Rp2denote the vectorization
of A.
2 Chain graph model
Suppose the joint distribution of x= (x1, ..., xp)>can be depicted as a chain graph G= (N,E),
where N={1, ..., p}represents the node set and E N ×N represents the edge set containing all
undirected and directed edges. To differentiate, we denote (ij)for an undirected edge between
nodes iand j,(ij)for a directed edge pointing from node ito node j, and suppose that at most
one edge is allowed between two nodes. Then, there exists a positive integer msuch that Ncan be
uniquely partitioned into mdisjoint chain components N=Sm
k=1 τk,where each τkis a connected
4
component of nodes via undirected edges. Suppose that only undirected edges exist within each
chain component, and directed edges are only allowed across different chain components [24, 10].
Further suppose that there exists a permutation π= (π1, ..., πm)such that for iτπkand jτπl,
if (ij), then k < l. This excludes the existence of semi-directed cycle in G[24]. We call such
permutation πas the causal ordering of the chain components, and directed edges could only point
from nodes in higher-order τkto nodes in lower-order one.
Let pa(i) = {j N : (ji) E},ch(i) = {j N : (ij) E } and ne(i) = {j
N: (ji) E} denote the parents, children and neighbors of node i, respectively. Further, let
pa(τk) = Siτkpa(i)be the parent set of chain component τk. Suppose the joint distribution of x
satisfies the Andersson–Madigan–Perlman (AMP) Markov property [1, 20] with respect to G,and
follows the linear structural equation model (SEM),
x=Bx+,(1)
where B= (βij)p×pis the coefficient matrix, = (1, ..., p)> N (0,1), and = (ωij)p×pis
the precision matrix of . Further, suppose that βji 6= 0 if and only if jpa(i), and ωij 6= 0 if and
only if jne(i). Therefore, the undirected and directed edges in Gcan be directly implied by the
zero patterns in and B, respectively. The joint density of xcan then be factorized as
P(x) =
m
Y
k=1
P(xτk|xpa(τk)),(2)
where xτk|xpa(τk) N (Bτk,pa(τk)xpa(τk),1
τkk)for k[m], and τkkis not necessarily a diago-
nal matrix. This is a key component of the chain graph model to allow undirected edges within each
τk, and thus differs from most existing SEM models with diagonal in literature [31, 30, 27, 6].
To assure the acyclicity among chain components in G, we say (,B)is CG-feasible if there
exists a permutation matrix Psuch that both PΩP>and PBP>share the same block structure,
5
where PΩP>is a block diagonal matrix and PBP>is a block lower triangular matrix with zero
diagonal blocks. Figure 1 shows a toy chain graph, as well as the supports of the original and
permuted (,B). Let Θdenote the precision matrix for x, then it follows from (1) that
Θ= (IpB)>(IpB) =: +L,(3)
where L=B>ΩB B>ΩB.
Figure 1: The left panel displays a toy chain graph with colors indicating different chain compo-
nents, and the right panel displays the supports of the original (,B)in the first column and the
permuted (,B)in the second column.
3 Proposed method
3.1 Identifiability of G
A key challenge in the chain graph model is the identifiability of the graph structure G, due to the
fact that and Bare intertwined in the SEM model in (1). To proceed, we assume that is sparse
with kk0=Sand Lis a low-rank matrix with rank(L) = K. The sparseness of implies
6
the sparseness of undirected edges in G, which has been well adopted in the literature of Gaussian
graphical model [26, 12, 3]. The low-rank of Linherits from that of B[11], which essentially
assumes the presence of hub nodes in G, i.e. nodes with multiple children or parents.
Let L=U1D1U>
1be the eigen decomposition of L,where U>
1U1=IKand D1is a K×K
diagonal matrix. We define two linear subspaces,
S() = {SRp×p:S>=S,and sij = 0 if ωij = 0},(4)
T(L) = U1Y+Y>U>
1:YRK×p,(5)
where S()is the tangent space, at point , of the manifold containing symmetric matrices with
at most Snon-zero entries, and T(L)is the tangent space, at point L, of the manifold containing
symmetric matrices with rank at most K[4].
Assumption 1. S()and T(L)intersect at the origin only; that is, S() T (L) = {0p×p}.
Assumption 1 is the same as the transversality condition in [4], which assures the identifiability
of (,L)in the sense that Θcan be uniquely decomposed as the sum of a matrix in S()and the
other one in T(L).
Assumption 2. The Keigenvalues of Lare distinct.
Assumption 2 is necessary to identify the eigen space of the low-rank matrix L,which has been
commonly assumed in the literature of matrix perturbation [40]. Let Qbe the parameter space of
CG-feasible (,B), where 0,+L0,kk0S, rank(L)K, and Assumptions 1
and 2 are met. Let (,B)denote the true parameters of the linear SEM model (1) satisfying
kk0=Sand rank(L) = K.
Theorem 1. Suppose (,B) Q.Then, there exists a small > 0such that for any (,B) Q
7
satisfying kkmax < and kBBkmax < , if
(IpB)>(IpB)=(IpB)>(IpB),
it holds true that (,B)=(,B).
Theorem 1 establishes the local identifiability of (,B)in (1), which further implies the
local identifiability of the chain graph G. It essentially states that (,B) Q can be uniquely
determined, within its neighborhood in Q, by the precision matrix Θ= (IpB)>(IpB),
which can be consistently estimated from the observed sample.
Remark 1 (Identifiability for DAG). When is indeed a diagonal matrix, each chain component
contains exactly one node and thus Greduces to a DAG. By Theorem 1, it is identifiable as long as
the eigenvalues of Lare distinct and ej/span(U
1)for any j[p], where U
1Rp×Kcontains
eigenvectors of Land {ej}p
j=1 are the standard basis of Rp.This provides an alternative identi-
fiability condition for DAG, in contrast to the popularly-employed equal error variance condition
[30].
3.2 Learning algorithm
We now develop a learning algorithm to estimate (,B)and reconstruct the chain graph G.
Suppose we observe independent copies x1, ..., xnRpand denote X= (x1, ..., xp)>Rn×p.
We first estimate via the following regularized likelihood,
(b
,b
L) = argmin
,Ll(+L) + λn(kk1,off +γkLk)(6)
subject to 0and +L0,
where l(Θ) = tr(Θb
Σ) + log{det(Θ)}is the Gaussian log-likelihood with b
Σ=1
nX>X, and
λnand γare tuning parameters. Here kk1,off =Pi6=j|ωij|induces sparsity in ,kLkinduces
8
low-rank of L, and the constraints are due to the fact that both and Θ=+Lare precision
matrices. Note that the optimization task in (6) is convex, and can be efficiently solved via the
alternative direction method of multipliers (ADMM; [2]). More computational details are deferred
to the Supplementary Files.
Once b
is obtained, we connect nodes iand jwith undirected edges if bωij 6= 0, which leads
to multiple estimated chain components, denoted as bτ1, ..., bτbm. To determine the causal ordering of
the estimated chain components, for each bτkand any C [p]\bτk, we define
b
D(bτk,C) = max
ibτknb
Σii b
ΣiCb
Σ1
CC b
ΣCib
1
ii o,
where b
Σii b
ΣiCb
Σ1
CC b
ΣCiis the estimated conditional variance for node ibτkgiven nodes in C,
and b
1
ii is the estimated variance for node ibτkgiven its parent chain components. It is thus
clear that b
D(bτk,C)shall be close to 0 if Cconsists of all upper chain components of bτk. We start
with b
D(bτk,) = maxibτkb
Σii b
1
ii for each chain component bτk, and select the first chain
component by bπ1= argminl[bm]b
D(bτl,). Suppose the first schain components bτbπ1, ..., bτbπshave
been selected, let b
Cs=s
k=1bτbπkand bπs+1 = argminl[bm]\∪s
k=1
bπkb
D(bτl,b
Cs). We repeat this procedure
until the causal orderings of all bτk’s are determined, which are denoted as b
π= (bπ1, ..., bπbm).
Finally, to estimate B, we first give an intermediate estimate b
Breg, whose submatrix b
Breg
bτbπk,
b
Ck1
is obtained via a multivariate regression of xbτbπkon xb
Ck1, as the directed edges are only allowed
from upper chain components to lower ones. Given b
Breg, we conduct singular value decomposition
(SVD) as b
Breg =b
Ureg b
Dreg(b
Vreg)>with b
Dreg = diag(bσreg
1, ..., bσreg
p), and then truncate the small
singular values to obtain b
Bsvd =b
Ureg b
Dsvd(b
Vreg)>, where b
Dsvd = diag(bσsvd
1, ..., bσsvd
p)with bσsvd
j= 0
if bσreg
jκnand bσsvd
j=bσreg
jif bσreg
j> κn, for some pre-specified κn>0. The final estimate b
B=
(b
βij)p×pis obtained by truncating the diagonal and upper triangular blocks to 0, and conducting a
hard thresholding to the lower triangular blocks with some pre-specified νn>0, where b
βij = 0 if
|b
βsvd
ij | νnand b
βij =b
βsvd
ij if |b
βsvd
ij |> νn. The nonzero elements of b
Bthen lead to the estimated
9
directed edges.
4 Asymptotic theory
This section quantifies the asymptotic behavior of (b
,b
B), and establishes its consistency for re-
constructing the chain graph G= (N,E). Let Θ= (IpB)>(IpB),and the Fisher
information matrix takes the form
I=Eh2l(Θ)
Θ2i= (Θ)1(Θ)1,
where denotes the Kronecker product. For a linear subspace M,let PMdenote the projection
onto M,and Mdenote the orthogonal complement of M. We further define two linear operators
F:S()× T (L) S()× T (L)and F:S()× T (L) S()× T (L)such that
F(,L) = (PS()(Iv(+L)),PT(L)(Iv(+L))),
F(,L) = (PS()(Iv(+L)),PT(L)(Iv(+L))).
According to Assumption 1 and Lemma ?? in the Supplementary Files, Fis invertible and thus
F1is well defined.
Let gγ(,L) = max{kkmax,kLk2},where γ > 0is the same as that in (6). Let L=
U
1D
1(U
1)>be the eigen decomposition of L.
Assumption 3. gγ(FF1(sign(), γU
1sign(D
1)(U
1)>)) <1.
Assumption 3 is essential for establishing the selection and rank consistency of (b
,b
L)through
penalties in (6). For example, in a special case when Θ=Ipand S() T (L), Assump-
tion 3 simplifies to max{γkU
1sign(D
1)(U
1)>kmax,ksign()k2}<1, implying that U
1is
not sparse and sign()is not a low-rank matrix. Similar technical conditions have also been
10
assumed in literature [4, 7].
Theorem 2. Suppose (,B) Q and Assumption 3 holds. Let λn=n1/2+ηwith a sufficiently
small positive constant η. Then, with probability approaching 1, (6) has a unique solution (b
,b
L).
Furthermore, we have
kb
kmax .Pn1
2+2η,Pr(sign( b
) = sign()) 1,
kb
LLkmax .Pn1
2+2η,Pr(rank(b
L) = rank(L))) 1.
(7)
Theorem 2 shows that b
has both estimation and sign consistency, which implies that the
undirected edges in Gcould be exactly reconstructed with high probability. It can also be shown
that b
Battains both estimation and selection consistency, implying the exact recovery of the directed
edges in G. Furthermore, given (b
,b
B), we reconstruct the chain graph as b
G= (N,b
E), where
(ij)b
Eif and only if b
βji 6= 0, and (ij)b
Eif and only if bωij 6= 0. The following Theorem 3
establishes the consistency of b
G.
Theorem 3. Suppose all the conditions in Theorem 2 are satisfied, and we set κn=n1/2+ηand
νn=n1/2+2η. Then, Pr( b
G=G)1as n .
Theorem 3 shows that the proposed method gives an exact recovery of the chain graph with
high probability, which is in sharp contrast to the existing methods only recovering some Markov
equivalent class of the chain graph [36, 29, 23, 34, 16, 28, 17].
5 Numerical experiments
5.1 Simulated examples
We examine the numerical performance of the proposed method and compare it against existing
structural learning methods for chain graph, including the decomposition-based algorithm (LCD,
11
[17]) and PC like algorithm (PC-like, [28, 17]), as well as the PC algorithm for DAG (PC, [18]).
The implementations of LCD and PC-like are available at https://github.com/majavid/
AMPCGs2019. We implement PC algorithm through R packages pcalg and then convert the
resulted partial DAG to DAG by pdag2dag. The significance level of all tests in LCD, PC-like
and PC is set as α= 0.05.
We evaluate the numerical performances of all four methods in terms of the estimation accu-
racy of undirected edges, directed edges and the overall chain graph. Specifically, we report re-
call, precision and Matthews correlation coefficient (MCC) as evaluation metrics for the estimated
undirected edges and directed edges, respectively. Furthermore, we employ Structural Hamming
Distance (SHD) [37, 17] to evaluate the estimated chain graph, which is the number of edge inser-
tions, deletions or flips to change the estimated chain graph to the true one. Note that large values
of recall, precision and MCC and small values of SHD indicate good estimation performance.
Example 1. We consider a classic two-layer Gaussian graphical model [21, 25] with two layers
A1={1, ..., d0.1pe} and A2={d0.1pe+ 1, ..., p}, whose structure is illustrated in Figure 2(a).
Within each layer, we randomly connect each pair of nodes by an undirected edge with probability
0.02, and note that one layer may contain multiple chain components. Then, we generate the
directed edges from nodes in A1to nodes in A2with probability 0.8. Furthermore, the non-zero
values of ωij and βij are uniformly generated from [1.5,0.5] [0.5,1.5]. To guarantee the
positive definiteness of , each diagonal element is set as ωii =Pp
j=1,j6=iωji + 0.1.
Example 2. The structure of the second chain graph is illustrated in Figure 2(b). Particularly,
we randomly connect each pair of nodes by an undirected edge with probability 0.03, and read off
multiple chain components {τ1, ..., τm}from the set of undirected edges. Then, we set the causal
ordering of the chain components as (π1, ..., πm) = (1, ..., m). For each chain component τk, we
randomly select the nodes as hubs with probability 0.2, and let each hub node points to the nodes in
m
i=k+1τiwith probability 0.8. Similarly, the non-zero values of ωij and βij are uniformly generated
from [1.5,0.5] [0.5,1.5], and ωii =Pp
j=1,j6=iωji + 0.1.
12
(a) (b)
Figure 2: The chain graph structures in Examples 1 and 2.
For each example, we consider four cases with (p, n) = (50,500),(50,1000),(100,500) and
(100,1000), and the averaged performance of all four methods over 50 independent replications
are summarized in Tables 1 and 2. As PC only outputs the DAGs with no undirected edges, its
evaluation metrics on b
are all NA.
Table 1: The averaged evaluation metrics of all the methods in Example 1 together with their
standard errors in parentheses.
(p, n)Method Recall(
b
) Precision(
b
) MCC(
b
) Recall(
b
B) Precision(
b
B) MCC(
b
B) SHD
(50,500) Proposed 0.6482 (0.0255) 0.8493 (0.0195) 0.7327 (0.0203) 0.2789 (0.0080) 0.4971 (0.0133) 0.3361 (0.0097) 187.8600 (2.4026)
LCD 0.1000 (0.0120) 0.1174 (0.0128) 0.0934 (0.0118) 0.1350 (0.0047) 0.5862 (0.0125) 0.2579 (0.0077) 192.7600 (1.5514)
PC-like 0.0441 (0.0075) 0.0424 (0.0084) 0.0270 (0.0076) 0.0055 (0.0007) 0.0679 (0.0090) -0.0006 (0.0025) 225.9800 (1.1569)
PC NA NA NA 0.0244 (0.0019) 0.0835 (0.0059) 0.0067 (0.0033) 238.1600 (1.1691)
(50,1000) Proposed 0.6729 (0.0237) 0.8380 (0.0208) 0.7425 (0.0196) 0.3060 (0.0095) 0.4663 (0.0124) 0.3386 (0.0107) 194.1200 (2.9303)
LCD 0.1174 (0.0139) 0.1394 (0.0113) 0.1122 (0.0115) 0.1869 (0.0054) 0.6762 (0.0113) 0.3324 (0.0078) 181.1600 (1.8179)
PC-like 0.0391 (0.0067) 0.0303 (0.0062) 0.0161 (0.0063) 0.0076 (0.0009) 0.0909 (0.0101) 0.0054 (0.0029) 231.7400 (1.0909)
PC NA NA NA 0.0296 (0.0019) 0.0898 (0.0051) 0.0109 (0.0031) 242.6000 (1.3613)
(100,500) Proposed 0.3475 (0.0092) 0.9987 (0.0009) 0.5832 (0.0080) 0.3480 (0.0051) 0.5383 (0.0068) 0.3984 (0.0059) 732.9800 (5.7756)
LCD 0.0279 (0.0028) 0.0860 (0.0089) 0.0396 (0.0048) 0.0751 (0.0019) 0.5693 (0.0093) 0.1882 (0.0043) 794.2800 (3.0063)
PC-like 0.0167 (0.0020) 0.0411 (0.0058) 0.0152 (0.0034) 0.0011 (0.0001) 0.0307 (0.0039) -0.0084 (0.0008) 859.3400 (2.4988)
PC NA NA NA 0.0057 (0.0004) 0.0421 (0.0030) -0.0114 (0.0011) 885.5600 (2.5778)
(100,1000) Proposed 0.4088 (0.0101) 0.9988 (0.0008) 0.6334 (0.0081) 0.3631 (0.0053) 0.4876 (0.0066) 0.3825 (0.0061) 775.5400 (7.2310)
LCD 0.0236 (0.0022) 0.0780 (0.0082) 0.0340 (0.0040) 0.0900 (0.0022) 0.6349 (0.0088) 0.2210 (0.0045) 779.8200 (3.0436)
PC-like 0.0193 (0.0020) 0.0349 (0.0037) 0.0131 (0.0026) 0.0016 (0.0002) 0.0377 (0.0042) -0.0072 (0.0009) 872.3000 (2.4148)
PC NA NA NA 0.0077 (0.0005) 0.0500 (0.0032) -0.0090 (0.0013) 896.0600 (2.5596)
From Tables 1 and 2, it is clear that the proposed method outperforms all competitors in most
scenarios. In Example 1, the proposed method produces a much better estimation of the undi-
13
Table 2: The averaged evaluation metrics of all the methods in Example 2 together with their
standard errors in parentheses. Here ** denotes the fact that the corresponding methods take too
long to produce any results.
(p, n)Method Recall(
b
) Precision(
b
) MCC(
b
) Recall(
b
B) Precision(
b
B) MCC(
b
B) SHD
(50,500) Proposed 0.5229 (0.0263) 0.7843 (0.0157) 0.6255 (0.0215) 0.3272 (0.0178) 0.5651 (0.0249) 0.4063 (0.0210) 128.8400 (7.2189)
LCD 0.4510 (0.0290) 0.5184 (0.0261) 0.4662 (0.0271) 0.1646 (0.0102) 0.5469 (0.0192) 0.2764 (0.0108) 132.3600 (7.9336)
PC-like 0.4548 (0.0331) 0.4296 (0.0257) 0.4225 (0.0292) 0.0231 (0.0024) 0.1668 (0.0146) 0.0439 (0.0052) 149.3600 (8.9086)
PC NA NA NA 0.1637 (0.0149) 0.2490 (0.0140) 0.1655 (0.0130) 160.1200 (8.2415)
(50,1000) Proposed 0.5704 (0.0293) 0.7719 (0.0166) 0.6481 (0.0231) 0.3568 (0.0204) 0.5571 (0.0252) 0.4195 (0.0225) 128.7800 (7.8260)
LCD 0.4723 (0.0301) 0.4873 (0.0218) 0.4609 (0.0255) 0.1885 (0.0110) 0.5721 (0.0173) 0.3052 (0.0119) 128.7800 (7.8959)
PC-like 0.4584 (0.0309) 0.4108 (0.0239) 0.4138 (0.0269) 0.0205 (0.0025) 0.1529 (0.0181) 0.0379 (0.0059) 149.9800 (8.8576)
PC NA NA NA 0.1848 (0.0176) 0.2599 (0.0142) 0.1828 (0.0156) 159.1600 (8.5205)
(100,500) Proposed 0.3560 (0.0090) 0.9984 (0.0010) 0.5880 (0.0076) 0.4124 (0.0255) 0.7447 (0.0270) 0.5438 (0.0256) 155.0600 (4.4310)
LCD ** ** ** ** ** ** **
PC-like ** ** ** ** ** ** **
PC NA NA NA 0.2710 (0.0269) 0.1013 (0.0087) 0.1472 (0.0125) 231.6000 (5.1892)
(100,1000) Proposed 0.5035 (0.0103) 0.9977 (0.0009) 0.7016 (0.0073) 0.5882 (0.0262) 0.8817 (0.0155) 0.7115 (0.0218) 114.5800 (4.1264)
LCD ** ** ** ** ** ** **
PC-like ** ** ** ** ** ** **
PC NA NA NA 0.2875 (0.0264) 0.1065 (0.0091) 0.1562 (0.0127) 228.1400 (4.9927)
rected edges than all other methods. For directed edges, the proposed method achieves the highest
Recall(b
B)and MCC(b
B). It is interesting to note that LCD gets higher Precision(b
B)than the pro-
posed method, possibly due to the fact that LCD tends to produce fewer estimated directed edges,
resulting in large Precision(b
B)but small Recall(b
B). In Example 2, the proposed method outper-
forms all competitors in terms of almost all the evaluation metrics. Note that LCD and PC-like
take too long to produce any results when p= 100, due to their expensive computational cost
when there exist many hub nodes.
5.2 Standard & Poor index data
We apply the proposed method to study the relationships among stocks in the Standard & Poor’s
500 index, and analyze the impact of the COVID-19 pandemic on the stock market. Chain graph
can accurately reveal various relationships among stocks, with undirected edges for symmetric
competitive or cooperative relationship between stocks and directed edges for asymmetric causal
relation from one stock to the another.
To proceed, we select p= 100 stocks with the largest market sizes in the Standard & Poor’s
500 index, and retrieve their adjusted closing prices during the pre-pandemic period, August 2017-
February 2020, and the post-pandemic period, March 2020-September 2022. The data is publicly
14
available on many finance websites and has been packaged in some standard softwares, such as the
R package quantmod. For each period, we first calculate the daily returns of each stock based
on its adjusted closing prices, and then apply the proposed method to construct the corresponding
chain graph.
Figure 3 displays the undirected edges between stocks in both estimated chain graphs, which
consist of 39 and 21 undirected edges in pre-pandemic and post-pandemic, respectively. It is
clear that there are more estimated undirected edges in the pre-pandemic chain graph than in the
post-pandemic one, which echos the empirical findings that business expansion were more ac-
tive, company cooperations were closer and competition were fiercer before the COVID-19 pan-
demic. Furthermore, there are 13 common undirected edges in both chain graphs, and all these 13
connected stock pairs are from the same sector, including VISA(V) and MASTERCARD(MA),
JPMORGAN CHASE(JPM) and BANK OF AMERICA(BAC), MORGAN STANLEY(MS) and
GOLDMAN SACHS(GS), and HOME DEPOT(HD) and LOWE’S(LOW). All these pairs share
the same type of business, and their competition or cooperation receive less impact by the COVID-
19 pandemic. In Figure 3, it is also interesting to note that the number of the undirected edges
between stocks from different sectors have reduced in the post-pandemic chain graph. This con-
curs with the fact that diversified business transactions between companies have been decreased
and only essential business contacts have been maintained during the COVID-19 pandemic.
Figure 4 displays the boxplots of causal orderings of all stocks within each sector in both
pre-pandemic and post-pandemic, where the causal ordering of a stock is set as that of the cor-
responding chain component. It is generally believed that causal ordering implies the imbalance
of social demand and supply; that is, if a sector is getting more demanded, its causal ordering is
inclined to move up to upstream. Evidently, Energy and Materials are always at the top of the
causal ordering in both periods, as they are upstream industries and provide inputs for most other
sectors. The median causal ordering of Telecommunication Services goes from downstream to
upstream after the outbreak of the COVID-19 pandemic, since people travel less and rely more
15
Figure 3: The left and right panel display all the estimated undirected edges for pre-pandemic and
post-pandemic, respectively. Stocks from the same sectors are dyed with the same color, and the
common undirected edges in both chain graphs are boldfaced.
on telecommunication for business communication. The median causal ordering of Finances goes
down during the pandemic, as commercial entities are more cautious about credit expansion and
demand for financial services is likely to decline to battle the financial uncertainty. It is somewhat
surprising that the median causal ordering of Healthcares appears invariant, but many pharma-
ceutical and biotechnology corporation in this section actually have changed from downstream to
upstream, due to the rapid development vaccine and treatment during the pandemic.
In addition, the estimated chain graphs in pre-pandemic and post-pandemic consist of 149 and
190 directed edges, respectively. While many directed edges remain unchanged, there are some
stocks whose roles have changed dramatically in the chain graphs. In particular, some stocks with
no child but multiple parents in the pre-pandemic chain graph become ones with no parent but
multiple children in the post-pandemic chain graph, such as COSTCO(COST), APPLE(AAPL),
ACCENTURE(ACN), INTUIT(INTU), AT&T(T) and CHUBB(CB). This finding appears reason-
able, as most of these stocks correspond to the high demanded industries during pandemic, such as
16
Figure 4: The left and right panel display the boxplots of the estimated causal ordering of the
top 100 stocks in each sector for pre-pandemic and post-pandemic, respectively. The sectors are
ordered according to the median causal ordering of stocks in post-pandemic.
COSTCO for stocking up groceries, AT&T for remote communication, and APPLE for providing
communication and online learning equipment. On the other hand, there are some other stocks
with no parent but multiple children in the pre-pandemic chain component becoming ones with
no child but multiple parents in the post-pandemic chain component, including TESLA(TSLA),
TJX(TJX), BRISTOL-MYERS SQUIBB(BMY), PAYPAL(PYPL), AUTOMATIC DATA PRO-
CESSING(ADP) and BOEING(BA). Many of these companies have been severely impacted dur-
ing pandemic, such as BOEING due to minimized travels and TESLA due to shrunk consumer
purchasing power.
6 Conclusion
In this paper, we establish a set of novel identifiability conditions for the Gaussian chain graph
model under AMP interpretation, exploiting a low rank plus sparse decomposition of the precision
matrix. An efficient learning algorithm is developed to recover the exact chain graph structure,
including both undirected and directed edges. Theoretical analysis shows that the proposed method
consistently reconstructs the exact chain graph structure. Its advantage is also supported by various
numerical experiments on both simulated and real examples. It is also interesting to extend the
proposed identifiability conditions and learning algorithm to accommodate non-linear chain graph
17
model with non-Gaussian noise.
Acknowledgment
This work is supported in part by HK RGC Grants GRF-11304520, GRF-11301521 and GRF-
11311022.
References
[1] Andersson, S. A., Madigan, D., and Perlman, M. D. (2001). Alternative Markov properties for
chain graphs. Scandinavian Journal of Statistics,28(1):33–85.
[2] Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization
and statistical learning via the alternating direction method of multipliers. Foundations and
Trends in Machine Learning,3(1):1–122.
[3] Cai, T., Liu, W., and Luo, X. (2011). A constrained `1minimization approach to sparse
precision matrix estimation. Journal of the American Statistical Association,106(494):594–
607.
[4] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. (2011). Rank-sparsity
incoherence for matrix decomposition. SIAM Journal on Optimization,21(2):572–596.
[5] Chen, C., Chang, K. C.-C., Li, Q., and Zheng, X. (2018). Semi-supervised learning meets fac-
torization: Learning to recommend with chain graph model. ACM Transactions on Knowledge
Discovery from Data (TKDD),12(6):1–24.
[6] Chen, W., Drton, M., and Wang, Y. (2019). On causal discovery with an equal-variance as-
sumption. Biometrika,106(4):973–980.
18
[7] Chen, Y., Li, X., Liu, J., and Ying, Z. (2016). A fused latent and graphical model for multi-
variate binary data. arXiv preprint arXiv:1606.08925.
[8] Cole, M. W., Reynolds, J. R., Power, J. D., Repovs, G., Anticevic, A., and Braver, T. S. (2013).
Multi-task connectivity reveals flexible hubs for adaptive task control. Nature Neuroscience,
16(9):1348–1355.
[9] Cox, D. R. and Wermuth, N. (1993). Linear Dependencies Represented by Chain Graphs.
Statistical Science,8(3):204 218.
[10] Drton, M. and Eichler, M. (2006). Maximum likelihood estimation in gaussian chain graph
models under the alternative Markov property. Scandinavian Journal of Statistics,33(2):247–
257.
[11] Fang, Z., Zhu, S., Zhang, J., Liu, Y., Chen, Z., and He, Y. (2020). Low rank directed acyclic
graphs and causal structure learning. arXiv preprint arXiv:2006.05691.
[12] Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with
the graphical Lasso. Biostatistics,9(3):432–441.
[13] Frydenberg, M. (1990). The chain graph Markov property. Scandinavian Journal of Statistics,
17(4):333–353.
[14] Golub, G. H. and Van Loan, C. F. (2013). Matrix Computations. JHU Press.
[15] Ha, M. J., Stingo, F. C., and Baladandayuthapani, V. (2021). Bayesian structure learn-
ing in multilayered genomic networks. Journal of the American Statistical Association,
116(534):605–618.
[16] Javidian, M. A. and Valtorta, M. (2018). Structural learning of multivariate regression chain
graphs via decomposition. arXiv preprint arXiv:1806.00882.
19
[17] Javidian, M. A., Valtorta, M., and Jamshidi, P. (2020). AMP chain graphs: Minimal separa-
tors and structure learning algorithms. Journal of Artificial Intelligence Research,69:419–470.
[18] Kalisch, M. and B¨
uhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs
with the PC-algorithm. Journal of Machine Learning Research,8(3):613–636.
[19] Lauritzen, S. L. and Wermuth, N. (1989). Graphical models for associations between vari-
ables, some of which are qualitative and some quantitative. Annals of Statistics,17(1):31–57.
[20] Levitz, M., Perlman, M. D., and Madigan, D. (2001). Separation and completeness properties
for AMP chain graph markov models. Annals of Statistics,29(6):1751–1784.
[21] Lin, J., Basu, S., Banerjee, M., and Michailidis, G. (2016). Penalized maximum likelihood
estimation of multi-layered gaussian graphical models. Journal of Machine Learning Research,
17:1–51.
[22] Luke, D. A. and Harris, J. K. (2007). Network analysis in public health: history, methods,
and applications. Annual Review of Public Health,28:69–93.
[23] Ma, Z., Xie, X., and Geng, Z. (2008). Structural learning of chain graphs via decomposition.
Journal of Machine Learning Research,9(95):2847–2880.
[24] Maathuis, M., Drton, M., Lauritzen, S., and Wainwright, M. (2018). Handbook of Graphical
Models. CRC Press.
[25] McCarter, C. and Kim, S. (2014). On sparse gaussian chain graph models. Advances in
Neural Information Processing Systems,27.
[26] Meinshausen, S. and B¨
uhlmann, P. (2006). High dimensional graphs and variable selection
with the lasso. Annals of Statistics,34(3):1436–1462.
20
[27] Park, G. (2020). Identifiability of additive noise models using conditional variances. Journal
of Machine Learning Research,21(75):1–34.
[28] Pe˜
na, J. M. (2014). Learning marginal AMP chain graphs under faithfulness. In European
Workshop on Probabilistic Graphical Models, pages 382–395. Springer.
[29] Pe˜
na, J. M., Sonntag, D., and Nielsen, J. (2014). An inclusion optimal algorithm for chain
graph structure learning. In Proceedings of the Seventeenth International Conference on Artifi-
cial Intelligence and Statistics, pages 778–786. PMLR.
[30] Peters, J. and B¨
uhlmann, P. (2014). Identifiability of Gaussian structural equation models
with equal error variances. Biometrika,101(1):219–228.
[31] Peters, J., Janzing, D., and Sch¨
olkopf, B. (2017). Elements of Causal Inference -Foundations
and Learning Algorithms. MIT Press, Cambridge, MA.
[32] Ravikumar, P., Wainwright, M., Raskutti, G., and Yu, B. (2011). High-dimensional covari-
ance estimation by minimizing `1-penalized log-determinant divergence. Electronic Journal of
Statistics,5:935–980.
[33] Sanford, A. and Moosa, I. (2012). A Bayesian network structure for operational risk mod-
elling in structured finance operations. Journal of the Operational Research Society,63(4):431–
444.
[34] Sonntag, D. and Pe˜
na, J. M. (2012). Learning multivariate regression chain graphs under
faithfulness. In Sixth European Workshop on Probabilistic Graphical Models, pages 299–306.
[35] Stanton-Salazar, R. D. and Dornbusch, S. M. (1995). Social capital and the reproduction
of inequality: Information networks among mexican-origin high school students. Sociology of
Education,68(2):116–135.
21
[36] Studen`
y, M. (1997). A recovery algorithm for chain graphs. International Journal of Ap-
proximate Reasoning,17(2-3):265–293.
[37] Tsamardinos, I., Brown, L., and Aliferis, C. (2006). The max-min hill-climbing Bayesian
network structure learning algorithm. Machine Learning,65(1):31–78.
[38] Wang, Y. and Bhattacharyya, A. (2022). Identifiability of linear AMP chain graph models.
Proceedings of the AAAI Conference on Artificial Intelligence,36(9):10080–10089.
[39] Wermuth, N. and Lauritzen, S. L. (1990). On substantive research hypotheses, conditional
independence graphs and graphical chain models. Journal of the Royal Statistical Society:
Series B (methodological),52(1):21–50.
[40] Yu, Y., Wang, T., and Samworth, R. J. (2015). A useful variant of the Davis–Kahan theorem
for statisticians. Biometrika,102(2):315–323.
22
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Graphs consisting of points, and lines or arrows as connections between selected pairs of points, are used to formulate hypotheses about relations between variables. Points stand for variables, connections represent associations. When a missing connection is interpreted as a conditional independence, the graph characterizes a conditional independence structure as well. Statistical models, called graphical chain models, correspond to special types of graphs which are interpreted in this fashion. Examples are used to illustrate how conditional independences are reflected in summary statistics derived from the models and how the graphs help to identify analogies and equivalences between different models. Graphical chain models are shown to provide a unifying concept for many statistical techniques that in the past have proven to be useful in analyses of data. They also provide tools for new types of analysis.
Article
Full-text available
Analyzing multi-layered graphical models provides insight into understanding the conditional relationships among nodes within layers after adjusting for and quantifying the effects of nodes from other layers. We obtain the penalized maximum likelihood estimator for Gaussian multi-layered graphical models, based on a computational approach involving screening of variables, iterative estimation of the directed edges between layers and undirected edges within layers and a final refitting and stability selection step that provides improved performance in finite sample settings. We establish the consistency of the estimator in a high-dimensional setting. To obtain this result, we develop a strategy that leverages the biconvexity of the likelihood function to ensure convergence of the developed iterative algorithm to a stationary point, as well as careful uniform error control of the estimates over iterations. The performance of the maximum likelihood estimator is illustrated on synthetic data.
Article
Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high-dimensional settings when the graphs to be learned are not sparse. In this article, we propose to exploit a low-rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low-rank techniques to adapt causal structure learning methods to take advantage of this assumption and establish several useful results relating interpretable graphical conditions to the low-rank assumption. Specifically, we show that the maximum rank is highly related to hubs, suggesting that scale-free (SF) networks, which are frequently encountered in practice, tend to be low rank. Our experiments demonstrate the utility of the low-rank adaptations for a variety of data models, especially with relatively large and dense graphs. Moreover, with a validation procedure, the adaptations maintain a superior or comparable performance even when graphs are not restricted to be low rank.
Article
We extend the decomposition approach for learning Bayesian networks (BNs) proposed by Xie et al. (2006) [55] to learning multivariate regression chain graphs (MVR CGs), which include BNs as a special case. The same advantages of this decomposition approach hold in the more general setting: reduced complexity and increased power of computational independence tests. Moreover, latent (hidden) variables can be represented in MVR CGs by using bidirected edges, and our algorithm correctly recovers any independence structure that is faithful to an MVR CG, thus greatly extending the range of applications of decomposition-based model selection techniques. Simulations under a variety of settings demonstrate the competitive performance of our method in comparison with the PC-like algorithm (Sonntag and Peña, 2012) [44]. In fact, the decomposition-based algorithm usually outperforms the PC-like algorithm except in running time. The performance of both algorithms is much better when the underlying graph is sparse.
Article
This paper deals with chain graphs (CGs) under the Andersson–Madigan–Perlman (AMP) interpretation. We address the problem of finding a minimal separator in an AMP CG, namely, finding a set Z of nodes that separates a given non-adjacent pair of nodes such that no proper subset of Z separates that pair. We analyze several versions of this problem and offer polynomial time algorithms for each. These include finding a minimal separator from a restricted set of nodes, finding a minimal separator for two given disjoint sets, and testing whether a given separator is minimal. To address the problem of learning the structure of AMP CGs from data, we show that the PC-like algorithm is order dependent, in the sense that the output can depend on the order in which the variables are given. We propose several modifications of the PC-like algorithm that remove part or all of this order-dependence. We also extend the decomposition-based approach for learning Bayesian networks (BNs) to learn AMP CGs, which include BNs as a special case, under the faithfulness assumption. We prove the correctness of our extension using the minimal separator results. Using standard benchmarks and synthetically generated models and data in our experiments demonstrate the competitive performance of our decomposition-based method, called LCD-AMP, in comparison with the (modified versions of) PC-like algorithm. The LCD-AMP algorithm usually outperforms the PC-like algorithm, and our modifications of the PC-like algorithm learn structures that are more similar to the underlying ground truth graphs than the original PC-like algorithm, especially in high-dimensional settings. In particular, we empirically show that the results of both algorithms are more accurate and stabler when the sample size is reasonably large and the underlying graph is sparse
Article
Prior work has shown that causal structure can be uniquely identified from observational data when these follow a structural equation model whose error terms have equal variance. We show that this fact is implied by an ordering among conditional variances. We demonstrate that ordering estimates of these variances yields a simple yet state-of-the-art method for causal structure learning that is readily extendable to high-dimensional problems.
Article
Integrative network modeling of data arising from multiple genomic platforms provides insight into the holistic picture of the interactive system, as well as the flow of information across many disease domains including cancer. The basic data structure consists of a sequence of hierarchically ordered datasets for each individual subject, which facilitates integration of diverse inputs, such as genomic, transcriptomic, and proteomic data. A primary analytical task in such contexts is to model the layered architecture of networks where the vertices can be naturally partitioned into ordered layers, dictated by multiple platforms, and exhibit both undirected and directed relationships. We propose a multi-layered Gaussian graphical model (mlGGM) to investigate conditional independence structures in such multi-level genomic networks in human cancers. We implement a Bayesian node-wise selection (BANS) approach based on variable selection techniques that coherently accounts for the multiple types of dependencies in mlGGM; this flexible strategy exploits edge-specific prior knowledge and selects sparse and interpretable models. Through simulated data generated under various scenarios, we demonstrate that BANS outperforms other existing multivariate regression-based methodologies. Our integrative genomic network analysis for key signaling pathways across multiple cancer types highlights commonalities and differences of p53 integrative networks and epigenetic effects of BRCA2 on p53 and its interaction with T68 phosphorylated CHK2, that may have translational utilities of finding biomarkers and therapeutic targets.
Article
Recently, latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. However, existing LFMs predict missing values in a user-item rating matrix only based on the known ones, and thus the sparsity of the rating matrix always limits their performance. Meanwhile, semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem by performing label propagation, which is mainly based on the smoothness insight on affinity graphs. However, graph-based SSL suffers serious scalability and graph unreliable problems when directly being applied to do recommendation. In this article, we propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM. The proposed CGM is a combination of Bayesian network and Markov random field. The Bayesian network is used to model the rating generation and regression procedures, and the Markov random field is used to model the confidence-aware smoothness constraint between the generated ratings. Experimental results show that our proposed CGM significantly outperforms the state-of-the-art approaches in terms of four evaluation metrics, and with a larger performance margin when data sparsity increases.
Article
In this paper, we address the problem of learning the structure of Gaussian chain graph models in a high-dimensional space. Chain graph models are generalizations of undirected and directed graphical models that contain a mixed set of directed and undirected edges. While the problem of sparse structure learning has been studied extensively for Gaussian graphical models and more recently for conditional Gaussian graphical models (CGGMs), there has been little previous work on the structure recovery of Gaussian chain graph models. We consider linear regression models and a re-parameterization of the linear regression models using CGGMs as building blocks of chain graph models. We argue that when the goal is to recover model structures, there are many advantages of using CGGMs as chain component models over linear regression models, including convexity of the optimization problem, computational efficiency, recovery of structured sparsity, and ability to leverage the model structure for semi-supervised learning. We demonstrate our approach on simulated and genomic datasets.
Article
The critical role of significant others in status attainment continues to be interpreted mainly in functionalist terms. This article presents an alternative interpretation based on social reproduction theories and on current research on social ties and adult occupational mobility. Using the concept of social capital, defined as social relationships from which an individual is potentially able to derive various types of institutional resources and support, the authors examine data on the information networks of a selected sample of Mexican-origin high school students. Apart from the influence of parental socioeconomic status, they assess how students' grades and educational and occupational expectations are related to the formation of instrumental ties to institutional agents (such as teachers and guidance counselors). Although the authors found some evidence for the relation between grades and status expectations and measures of social capital, their strongest associations were with language measures, suggesting that bilinguals may have special advantages in acquiring the institutional support necessary for school success and social mobility.