ArticlePDF Available

Abstract

1 Motivation Biological networks contribute effectively to unveil the complex structure of molecular interactions and to discover driver genes especially in cancer context. It can happen that due to gene mutations, as for example when cancer progresses, the gene expression network undergoes some amount of localised re-wiring. The ability to detect statistical relevant changes in the interaction patterns induced by the progression of the disease can lead to discovery of novel relevant signatures. 2 Results Several procedures have been recently proposed to detect sub-network differences in pairwise labeled weighted networks. In this paper, we propose an improvement over the state-of-the-art based on the Generalized Hamming Distance adopted for evaluating the topological difference between two networks and estimating its statistical significance. The proposed procedure exploits a more effective model selection criteria to generate p-values for statistical significance and is more efficient in terms of computational time and prediction accuracy than literature methods. Moreover, the structure of the proposed algorithm allows for a faster parallelized implementation. In the case of dense random geometric networks the proposed approach is 10−15x faster and achieves 5-10% higher AUC, Precision/Recall, and Kappa value than the state-of-the-art. We also report the application of the method to dissect the difference between the regulatory networks of IDH-mutant versus IDH-wild-type glioma cancer. In such a case our method is able to identify some recently reported master regulators as well as novel important candidates. 3 Availability The scripts implementing the proposed algorithms are available in R at https://sites.google.com/site/raghvendramallmlresearcher/codes . 4 Contact rmall@qf.org.qa
Mall et al.
METHODOLOGY
Detection of statistically significant network
changes in complex biological networks
Raghvendra Mall1rmall@qf.org.qa , Luigi Cerulo2,3lcerulo@unisannio.it , Halima
Bensmail1hbensmail@qf.org.qa , Antonio Iavarone4ai2102@cumc.columbia.edu and Michele
Ceccarelli1*mceccarelli@gmail.com
*Correspondence:
mceccarelli@gmail.com
1QCRI - Qatar Computing
Research Institute, HBKU, Doha,
Qatar
Full list of author information is
available at the end of the article
Abstract
Background: Biological networks contribute eectively to unveil the complex
structure of molecular interactions and to discover driver genes especially in
cancer context. It can happen that due to gene mutations, as for example when
cancer progresses, the gene expression network undergoes some amount of
localized re-wiring. The ability to detect statistical relevant changes in the
interaction patterns induced by the progression of the disease can lead to the
discovery of novel relevant signatures. Several procedures have been recently
proposed to detect sub-network dierences in pairwise labeled weighted networks.
Results: In this paper, we propose an improvement over the state-of-the-art
based on the Generalized Hamming Distance adopted for evaluating the
topological dierence between two networks and estimating its statistical
significance. The proposed procedure exploits a more eective model selection
criteria to generate p-values for statistical significance and is more ecient in
terms of computational time and prediction accuracy than literature methods.
Moreover, the structure of the proposed algorithm allows for a faster parallelized
implementation. In the case of dense random geometric networks the proposed
approach is 10-15x faster and achieves 5-10% higher AUC, Precision/Recall, and
Kappa value than the state-of-the-art. We also report the application of the
method to dissect the dierence between the regulatory networks of IDH-mutant
versus IDH-wild-type glioma cancer. In such a case our method is able to identify
some recently reported master regulators as well as novel important candidates.
Conclusions: We show that our network dierencing procedure can eectively
and eciently detect statistical significant network re-wirings in dierent
conditions. When applied to detect the main dierences between the networks of
IDH-mutant and IDH-wild-type glioma tumors, it correctly selects sub-networks
centered on important key regulators of these two dierent subtypes. In addition,
its application highlights several novel candidates that cannot be detected by
standard single network-based approaches.
Keywords: Dierential Networks; Gene Regulatory Network Inference; Master
Regulators
Background
The omni-presence of complex networks is reflected in wide variety of domains
including social networks [1, 2], web graphs [3], road graphs [4], communication
networks [5], financial networks [6] and biological networks [7, 8, 9]. Although we
focus on biological networks many aspects of the method proposed in this paper can
Mall et al. Page 2 of 23
also be applied for networks in other contexts. In cancer research, the comparison
between gene regulatory networks, protein interaction networks, and DNA methy-
lation networks is performed to detect dierences between two conditions, such as,
healthy and disease [10, 11]. This can lead to discovery biological pathways related
to the disease condition, and, in case of cancer, the gene regulatory changes as the
disease progresses [12, 13, 14].
A central problem in cell biology is to model functional networks underlying in-
teractions between molecular entities from high throughput data. One of the main
question is how the cell globally changes its behavior in response to external stimuli
or what is the eect of alterations such as, driver somatic mutations and changes
in copy number. Signatures of dierentially expressed and/or methylated genes are
the downstream eect of global cell de-regulation in dierent conditions such as
cancer subtypes. Therefore, it is argued that driver mutations activate functional
pathways described by dierent global re-wiring of the underlying gene regulatory
network.
The identification of significant changes induced by the presence or the progression
of disease can help to discover novel molecular diagnostics and prognostic signatures.
For example, it is known that, according to the mutation of the gene IDH [15, 16],
the majority of malignant brain tumors can be divided two main macro-categories,
which can be further divided in seven molecular and clinically distinct subtypes [17].
These two macro-groups are characterized by highly dierent global expression and
epigenomic profiles. Hence, one of the main questions to understand the molecular
basis of diseases is how to identify significant changes in the regulatory structure in
dierent conditions.
Various techniques have been developed to compare two graphs including graph
matching and graph similarity algorithms [18, 19, 20]. However, the problem ad-
dressed in this paper is dierent from popular graph theory problems including
graph isomorphism [21] and sub-graph matching [22]. Here the goal is to identify
statistically significant dierences between two weighted networks (with or without
labels).
One common statistic used to distinguish one graph, Afrom another B,hav-
ing the same number of nodes N, is the Mean Absolute Dierence (MAD) met-
ric, defined as: d(A, B)= 1
N(N1) Pi6=j|aij bij |,whereaij and bij are edge
weights corresponding to the topology of networks Aand B. This distance mea-
sure is equivalent to the Hamming distance [23] and has been extensively used
in literature to compare networks [24, 25]. Another statistic used to test associa-
tion between networks is the Quadratic Assignment Procedure (QAP) defined as:
Q(A, B)= 1
N(N1) Pi=1 Pj=1 aij bij . The QAP metric is used in a permutation-
based procedure to dierentiate two networks [26, 27]. Ruan et al. showed that these
metrics are not always sensitive to subtle topological variations [28].
Our aim is to detect statistically significant dierences between two networks un-
der the premise that any true topological dierence between the two networks would
involve only a small set of edges when compared to all the edges in the network.
Recently, a Generalized Hamming Distance (GHD) based method was introduced
to measure the distance between two labeled graphs [28], where it was shown that
the GHD statistic is more robust than MAD and QAP metrics for identifying sub-
tle variations in the topology of paired networks. In particular the authors showed
Mall et al. Page 3 of 23
that GHD permutation distribution follows a normal distribution with closed-form
expression for first two moments under the null hypothesis that networks Aand B
are independent. Utilizing the moments, corresponding p-values were obtained in
closed-form. They also propose a dierential sub-network identification technique
namely dGHD. The advantage of this technique is that – unlike previous dier-
ential network analysis techniques [25, 29, 30] – it provides a closed-form solution
for p-values for the dierential sub-network left after iterative removal of the least
dierential nodes. We propose an improvement over dGHD, namely Closed-Form
approach that exploits the conditions for asymptotic normality which is computa-
tionally cheaper and attains better prediction performance than the dGHD algo-
rithm. Computational eciency and prediction accuracy is crucial in cancer con-
texts where networks have a large number of nodes and the topological dierence
is associated to few driver genes.
Methods
Preliminaries on Generalized Hamming Distance
The Generalized Hamming Distance is a way to estimate the distance between two
weighted graphs [28]. Let A=(V, EA) and B=(V, EB) be two graphs, with the
same set of nodes V={1,...,N}, and dierent sets of edges, EAand EB.The
Generalized Hamming Distance (GHD) is defined as:
GHD(A,B) = 1
N(N 1) X
i,j,i6=j
(a0
ij b0
ij)2,(1)
where a0
ij and b0
ij are mean centered edge-weights defined as:
a0
ij =aij 1
N(N1) X
i,j,i6=j
aij ,b
0
ij =bij 1
N(N1) X
i,j,i6=j
bij
The edge weights, aij and bij, depend on the topology of the network and pro-
vide a measure of connectivity between every pair of nodes iand jin Aand B.
Dierent metrics have been adopted to measure the connectivity between pairs of
nodes, including: topological overlap (TO) [31, 32], cosine similarity and Pearson
correlation [33]. In our experiments, we used the cosine similarity to capture first
order interactions between the nodes in the network. Cosine similarity computation
scales well for large sparse networks and can be used in place of TO, as it has nearly
perfect correlation with it.
Given two networks Aand B, a permutation of the labels of the vertices of
A(keeping the edges unchanged) generates a permuted network A. The quantity
GHD(A,B) represents the test statistics of an inferential problem having as null
hypothesis Ho:Graphs Aand Bare independent [28]. The distribution of GHD
can be obtained through an exhaustive calculation which can be approximated by
a Monte Carlo approach. The authors of [28], indeed, simplified this calculation
showing that under the null hypothesis it can be approximated well by a normal
distribution with moments that can be obtained analytically.
Mall et al. Page 4 of 23
This can be shown as:
GHD(A,B) µ
N(0,1) (2)
where µis the asymptotic value of the mean GHD statistic and is the asymp-
totic value of the standard deviation of GHD statistic computed between Aand
B. In order to calculate the µand values we define:
St
a=
N
X
i=1
N
X
j=1,j6=i
at
ij ,t=1,2 and Ta=
N
X
i=1
(
N
X
j=1,j6=i
aij )2
St
b=
N
X
i=1
N
X
j=1,j6=i
bt
ij ,t=1,2 and Tb=
N
X
i=1
(
N
X
j=1,j6=i
bij )2
Here at
ij and bt
ij are the edge weights with the power t. Furthermore, we require the
following terms:
Aa=(S1
a)2,B
a=Ta(S2
a) and Ca=A
a+ 2(S2
a)4Ta
Ab=(S1
b)2,B
b=Tb(S2
b) and Cb=A
b+ 2(S2
b)4Tb
Using these definitions the closed-form expression for mean µand variance 2
are expressed as:
µ=S2
a+S2
b
N(N1) 2(S1
a)(S1
b)
N2(N1)2,
2
=4
N3(N1)3[2(S2
a)(S2
b)+4(Ba)(Bb)
N2+
(Ca)(Cb)
(N2)(N3) (Aa)(Ab)
N(N1)]
(3)
Given a significance threshold (e.g. 0.01), p-values >indicate that there is
no sucient evidence to reject the null hypothesis (Ho) that graphs Aand Bare
independent. Hence, higher p-values indicate more probability that the two graphs
under consideration are independent.
Dierential sub-network detection with GHD
The GHD distance is able to tell us to what extent are two graphs dierent but is
not able to identify which parts of the graph are similar and which are dierent. In
this work, we are interested in detecting which part of the graphs contribute to make
the two graphs dierent. We call such dierent sub-graphs dierential sub-networks.
The notion of dierential sub-networks is based on the idea that when comparing
two networks only a subset of edges would present altered interaction. The goal
is to identify the set of nodes, namely V, associated with such a subset of edges
and the p-values pcorresponding to the nodes in V. This goal, formulated as a
statistical test, requires that for such a subset Vthere is no sucient evidence
to reject the null hypothesis that the corresponding sub-networks A(V,E
A) and
B(V,E
B) are statistically independent.
Mall et al. Page 5 of 23
The idea here is to adopt an iterative technique to identify the set of nodes
Vwhich contributes more to the dierence. We start from the dGHD algorithm
proposed in [28]. The algorithm measures the edge connectivity with topological
overlap metric and benefits from the closed-form solution of p-value (Equations (3)).
In the dGHD algorithm, an iterative procedure is followed where at each iteration
the change in centralized GHD (cGHD) i.e. cGHD = GHD(A, B)µis estimated
after the removal of one node. The node where the change in cGHD (i.e. dierence
in cGHD before and after removal of a node) is maximum is removed. The GHD
statistic is computed for remaining sub-networks and the p-value is estimated. This
process is repeated till a user specified minimal set size is reached or it is no-longer
possible to have closed-form representation for p-values which happens for N3
as shown in equation 3. The p-values are then adjusted for multiple testing by
controlling the false discovery rate [34].
The dGHD algorithm suers from the following limitations: a) During the ith
iteration, the GHD measure is calculated Nitimes on dierent sub-graphs with
an overall time complexity O(N2|E|)whereE=EA[EB; b) The algorithm is
prone to discovery more false positives since it uses the change in cGHD (cGHD) as
a model selection criterion. We overcome such limitations by proposing the following
improvements:
1Remove nodes by exploiting the Closed-Form. We use the idea that nodes
which have similar topology in networks Aand Bwill contribute the least
to cGHD. So, we first calculate the closed-form contribution of each node in
cGHD once using equation 4 and then iteratively remove nodes with least
contributions. However, this process is continued till we observe that the p-
value of the remaining sub-network becomes greater than a threshold .
2Using a dierent model selection criterion. Once the p-value reaches ,we
follow a procedure similar to the dGHD algorithm but use the more intuitive
criterion of selecting the node that when removed makes the cGHD value
maximum rather than using the change in the cGHD value (before and after
removal of a node) as a model selection criterion. By using this model selection
criterion, we iteratively identify and remove that node whose contribution is
least in the cGHD.
The advantage of the Closed-Form approach is that we significantly reduce the
computational complexity and improve the predictive performance. A simple
alternative to the Closed-Form approach would be to sort all the nodes based
on their contribution to cGHD and thus rank all the nodes based on their ca-
pability to dierentiate the two networks with complexity (O(Nlog N)). How-
ever, then we will not be able to identify statistically dierent sub-networks
between the two graphs as indicated in [28].
Closed-Form Approach
We propose a fast approach to perform dierential sub-network analysis taking into
consideration the contribution of each node to GHD and µ. Using equations (1)
Mall et al. Page 6 of 23
and (3) this can mathematically be represented as:
GHD(A, B)(i)= 1
N(N1)(
N
X
j=1,j6=i
(a0
ij )2+
N
X
j=1,j6=i
(b0
ij )2
N
X
j=1,j6=i
(2a0
ij b0
ij ))
µ(i)=(PN
j=1,j6=i(aij )2+PN
j=1,j6=i(bij )2)
N(N1) 2(PN
j=1,j6=iaij )(S1
b)
N2(N1)2
2(PN
j=1,j6=ibij )(S1
a)
N2(N1)2+2(PN
j=1,j6=iaij )(PN
k=1,k6=ibik )
N2(N1)2
(4)
We observe that if we sum GHD(A, B)(i) and µ(i)8i2V, we obtain GHD(A, B)
and µ. We use the idea that nodes which have similar topology in networks Aand
Bwill contribute the least to centralized GHD, i.e. GHD(A, B)µ. We calculate
the Closed-Form contribution of each node in the centralized GHD (cGHD) once
using equation (4) and then iteratively remove nodes with least contribution to the
cGHD, i.e. nodes having similar topology in graphs Aand B. Thus, we calculate
cGHD once and sort all the nodes based on their contribution to the cGHD metric.
This process is continued till we observe that the p-value of the remaining sub-
network becomes greater than a threshold . Once the p-value reaches , we estimate
VK= GHD(A(VK,E
A),B(VK,E
B)) µVKwhere µVKis the mean of the permu-
tation distribution for the nodes (VK) of the remaining sub-network. Furthermore,
we define VK|ias the value of cGHD after removal of node i. We adopt a dier-
ent model selection criterion than that proposed in [28] to remove non-dierential
nodes. We use the intuitive criterion of selecting that node after removal of which
the cGHD value becomes maximum, i.e. the node which was most similar in terms
of topology for the paired-graphs. Finally, the obtained p-values are adjusted for
multiple testing by controlling the false discovery rate [34]. Provided the paired-
graphs Aand B, the calculation of VK|ican be done independently for each i.
Details of the Closed-Form method is provided in Algorithm 1. The sensitivity of
the Closed-Form approach with the parameter is demonstrated in Experimental
Results section. Table 1 summarizes the improvements with respect to the dGHD
algorithm in terms of time complexity.
Alternative Procedure (Fast Approximation)
We propose an alternative procedure to the Closed-Form approach namely the Fast
Approximation method where we first calculate the cGHD value without including
the ith node, 8i2Vonce. This helps to estimate the cGHD value after removal of
the ith node and can be performed in parallel. Our aim is to quickly discard those
nodes after removal of which the cGHD value becomes large thereby removing
nodes which were contributing least to the cGHD value. This helps to reduce the
dependence between the two sub-networks by removing nodes which have similar
topology in graphs Aand B. Again, the idea is motivated by the premise that only
a subset of nodes will form the dierential sub-networks in graph Aand B.
Mall et al. Page 7 of 23
Algorithm 1: Closed-Form
Data:Graphs Aand Bwith Nvertices V.
Result:Subset Vrepresenting the set of nodes which comprise the dierential sub-network & p-values for
GHD measure.
V={} // Empty Set for differential sub-network nodes.
VK=V// Initialize a copy of the set of vertices V.
p={} // Empty Set for p-values.
Calculate contribution of each node iin centralized GHD using equation 4.
Sort all nodes based on their contribution in ascending order and keep in O.
while N>3do
z=GHD(A(VK,EA),B(VK,EB))µVK
VK
.
Calculate p-value using zand append p-value to p.
if p-value >then
VK={} forall the iVKdo
t=(GHD(A(VK|i,E
A),B(VK|i,E
A)) µVK|i).
Add tto VK// Perform in parallel.
n=maxiVK
// Select that node after removal of which cGHD becomes maximum.
Remove node nfrom VKi.e VK=VK\nand O=O\n
else if p-value <then
n=mini(O)// Select node in the sub-network with least contribution.
Remove node nfrom O.
// Ois sorted so remove 1st node.
if p-value >0.01 then
Append nto V.
N=N1.
Adjust the p-values for false-discovery rate [34].
In this approach, we iteratively discard those nodes after removal of which the
cGHD value becomes maximal till the p-value for the remaining sub-network reaches
a threshold . Once the p-value reaches , we return back to the procedure of
estimating VK|i8i2VKas described in the Closed-Form approach. We use the
same model selection criterion of selecting that node after removal of which the
cGHD value becomes maximum as used in the Closed-Form approach. We then
adjust the obtained p-values for multiple testing by controlling the false discovery
rate [34]. We refer to this technique as a Fast Approximation to the dGHD [28]. We
explain the Fast Approximation technique in detail in Algorithm 2.
From our experiments, we observe that the results of the Closed-Form approach
and the Fast Approximation technique are identical. Although, in the case of Closed-
Form approach, we calculate closed-form contribution of each node in the cGHD
value and remove the node with least contribution, while in case of Fast Approxi-
mation we select that node after removal of which cGHD value becomes maximum,
the ordered list Oobtained for both the methods is identical. Moreover, the com-
putational complexity of the Fast-Approximation technique is the same as that of
Closed-Form approach.
Inference of the Glioma networks and Master Regulator Analysis
We used the TCGA pan-glioma samples dataset including 1250 samples (463 IDH-
mutant and 653 IDH-wild-type), 583 of which profiled with Agilent microarray
and 667 with RNA-Seq Illumina HiSeq (REF) downloaded from the TCGA portal.
The batch eects between the two platform were corrected using the COMBAT
algorithm [35]. The final gene expression data matrix includes 12,985 genes and
1250 samples. We re-constructed two gene regulatory networks belonging to two
Mall et al. Page 8 of 23
Algorithm 2: Fast Approximation
Data:Graphs Aand Bwith Nvertices V.
Result:Subset Vrepresenting the set of nodes which comprise the dierential sub-network & p-values for
GHD measure.
V={} // Empty Set for differential sub-network nodes.
VK=V// Initialize a copy of the set of vertices V.
p={} // Empty Set for p-values.
VK={} forall the iVKdo
t=GHD(A(VK|i,E
A),B(VK|i,E
A)) µVK|i.
// Estimate cGHD value after removal of node i.
Add tto VK.// Perform in parallel.
Sort VKin descending order and keep in O.
while N>3do
z=GHD(A(VK,EA),B(VK,EB))µVK
VK
.
Calculate p-value using zand append p-value to p.
if p-value >then
VK={} forall the iVKdo
t=(GHD(A(VK|i,E
A),B(VK|i,E
A)) µVK|i).
Add tto VK// Perform in parallel.
n=maxiVK
// Select that node after removal of which cGHD becomes maximum.
Remove node nfrom VKand O
else if p-value <then
n=maxi(O)// Select node in the sub-network with least contribution.
Remove node nfrom O.
if p-value >0.01 then
Append nto V.
N=N1
Adjust the p-values for false-discovery rate.
dierent glioma subtypes: IDH-mutant and IDH-wild-type. Both networks were re-
constructed with a four step procedure that follows ARACNe [36]: i) Computation
of mutual information between gene expression profiles to determine interaction
between Transcription Factors (TFs) and target genes [37]; ii) data processing in-
equality to filter out indirect relationships [36], iii) permutation test with 1,000
re-samplings to keep only statistically significant relationships. We also assembled
a global glioma network using all the available 1250 transcriptional profiles using the
aforementioned method. In this last case we also used intersection with transcription
factor (TF) binding sites to keep only relationships due to promoter binding. We
used a set of 457 TF binding sites available in the MotifDB Bioconductor package.
Master Regulator Analysis (MRA) algorithm [38] was applied to the global glioma
network in order to compute the statistical significance of the overlap between
the regulon of each TF (i.e. its ARACNe inferred targets) and the dierentially
expressed gene list (Wilcoxon-Mann-Whitney test FDR 0.05) between IDH-
mutant and IDH-wild-type samples. Given a gene interaction network, generated
by ARACNe and a gene phenotype signature (e.g. a set of dierentially expressed
genes), the MRA algorithm computes for each TF the enrichment of the phenotype
signature in the regulon of that TF. The regulon of a TF is defined as its neighbor-
hood in the gene interaction network. There are two dierent methods to evaluate
the enrichment of the signature in the regulon. One method uses the statistical
Fishers exact test, while the other approach uses Gene Set Enrichment Analysis
(GSEA). Here we used this last method.
A Master Regulator (MR) gene is a TF which regulon exhibit a statistical signif-
icant enrichment of the given phenotype signature.
Mall et al. Page 9 of 23
Validation in the Rembrandt dataset
We used an independent dataset to perform the same analysis of network dier-
encing between IDH-mutant and IDH-wild-type gliomas and check the the overlap
between the two analyses. Raw gene expression (Aymetrix U133 Plus 2.0) from the
publically available Repository for Molecular Brain Neoplasia Data (Rembrandt)
(https://caintegrator.nci.nih.gov/rembrandt/) included 444 samples divided in 218
Glioblastoma, 148 Astrocytoma, 67 Oligodendrogliomas and 11 mixed histologies.
Expression subtype and IDH status was inferred from gene expression following the
procedure in [39] resulting in 153 wild-type and 162 mutant samples. These two set
of expression profiles were used to generate two regulatory networks using the same
approach reported above.
Results and Discussion
For all our experiments, we used the Closed-Form approach (since results obtained
from Closed-Form and Fast-Approximation techniques are identical) and compare
it with the dGHD method [28].
Cosine similarity and topological overlap
The one-step topological overlap measure used to estimate the edge weights is de-
fined as:
aij =Pl6=i,j AilAlj +Aij
min(Pl6=iAil Aij ,Pl6=jAlj Aij )+1 (5)
In this work we use the cosine similarity to calculate the edge weights aij.The
cosine similarity takes into consideration one-step neighborhood of nodes iand
jwhile constructing the edge weight and is very ecient to calculate for sparse
matrices. The weights aij are estimated as follows:
aij =PlAilAjl
pPlA2
ilqPlA2
jl
(6)
where Aij represents the adjacency matrix.
We perform an experiment to calculate the correlation between the one-step topo-
logical measure and the cosine similarity measure. For this experiment, we gener-
ated 250 random geometric networks using N= 250 and the connectivity parameter
d=0.15.
Figure 1 shows that the cosine similarity metric is nearly perfectly correlated
(Pearson correlation = 0.952) to the topological overlap measure.
Sensitivity to
In this experiment, we check the sensitivity of the proposed Closed-Form approach
w.r.t. the heuristic . For this experiment, we first generated 100 random geometric
(RG) networks. In a RG network nodes are generated by uniformly sampling N
points on [0,1]2. An edge is then drawn between these points if the Euclidean
distance between the points is less than a parameter d. This parameter dcontrols the
density of the RG network where smaller values of dresult in sparse networks while
Mall et al. Page 10 of 23
larger values of dgenerates dense networks. In our case, we conducted experiments
using two dierent settings. In the first case, we use d=0.15, while in the second
setting, we use d=0.3. For both experiments we fix N= 250. For each value of d
and for each generated RG network A, we permute the first 50 rows and columns
of the network to generate network B. Therefore, the first 50 nodes in networks A
and Bform the gold-standard.
In order to test the sensitivity of the proposed approach w.r.t. , we estimate
the fraction of permuted nodes correctly identified by the Closed-Form method for
various values of . We used a grid of values varying from ={1050,...,10300}
in multiplicative steps of 1020.The goal of this experiment is to show that
the fraction of correctly identified nodes w.r.t. various 2remains
nearly constant for smaller values of .Figure 2 shows the result for RG
networks with density parameter d=0.15 and d=0.3. From Figure 2, we observe
that the median fraction of permuted nodes identified by the proposed approaches
increases slowly before it converges to a nearly constant value as we decrease the
threshold (i.e. increase absolute log of threshold ).
From this experiment, we conclude that the fraction of truly dierential nodes
identified by the proposed methods increases as we decrease the threshold before
it starts to converge for smaller values of threshold .
We performed further experiments using dierent for various values of Nand
observed that threshold behaves similarly independent of the value of N.Weused
the = 1050 as heuristic cut-ofor future experiments.
Predictive performance comparison
Experimental Setup: The next simulation study that we carried out was to com-
pare the predictive performance of the proposed approach w.r.t. the dGHD [28]
technique. For this experiment, we generate 100 RG networks with N=1,000.
For the first experiment we fix the density parameter d=0.15 and permute first
100 nodes in network Ato obtain network B. Thus, these first 100 nodes form the
dierential sub-network for the paired networks Aand B.
In the second case, we use the density parameter d=0.3 to generate the edges
for network A. We then generate a small RG network with 100 nodes using density
parameter d0=0.5. This small dense sub-network is then used to replace the net-
work formed by first 100 nodes in the original network Ato form network B.Thus,
in the second experiment, these 100 nodes form the dierential sub-network for the
paired networks Aand B. This kind of mechanism can appear in real-life networks,
for example, in case of cancer the transcription activity of some set of genes might
get enhanced or suppressed in patients resulting in more or fewer edges in a sub-
network of the gene or DNA methylation network. Hence, the networks generated
in the first case are much sparser in comparison to the networks generated in the
second case.
Evaluation Metrics: We define the following terms to be used in our analysis:
True Positives (TP) - Refers to the nodes that are correctly identified as part
of a dierential network.
False Positives (FP) - Refers to the nodes that are incorrectly identified as
part of a dierential network.
Mall et al. Page 11 of 23
False Negatives (FN) - Refers to the nodes that are part of the dierential
sub-network but are not identified correctly as part of the sub-network.
True Negatives (TN) - Refers to the nodes that are correctly identified as
nodes which are not part of the dierential sub-network Aand B.
ROC and PR curve comparisons: We generate two set of plots including
the receiver operating characteristic (ROC) curves and the precision-recall (PR)
curves. To generate the plots as shown in Figure 3, we use the ‘ROCR’ [40] pack-
age in R. It generates relatively smooth curves by automatically using dierent
thresholds to estimate the true positive rate i.e. n(TP)
n(TP)+n(FN)and the false positive
rate i.e. n(FP)
n(FP)+n(TN)for ROC plot and precision i.e. n(TP)
n(TP)+n(FP)and recall i.e.
n(TP)
n(TP)+n(FN)for the PR plot. Here we use the true positive rate (TPR) and Re-
call interchangeably. Here n(·) represents the total number of nodes. For generating
the plots we used the adjusted p-value lists as obtained from the Closed-Form and
dGHD approaches without specifying any threshold to generate smooth curves.
The data in Figure 3A and Figure 3C shows that Closed-Form approach achieves
better performance in case of dierential sub-networks formed by permuted nodes
and sub-networks with higher density. One of the reasons for relatively poor per-
formance of the dGHD approach is that it has low true positive rate (TPR) and
a high false positive rate (FPR) when the network has more edges. This is also
reflected by the relatively low Recall and Precision values for the dGHD algorithm
in Table 2 when d=0.3 and d0=0.5. From Figure 3C, we can observe that the
performance of both the dGHD and Closed-Form algorithm improves w.r.t. ROC
when the dierential sub-network is denser than the remaining network. However,
the gap between the PR curves of Closed-Form and dGHD methods increases when
the dierential sub-network is denser.
AUC comparison: For all further simulated experiments, we use p-value 0.01 as
cut-oin order to determine TP, TN, FP and FN respectively. We also evaluated
the area under the ROC curve (AUC ROC [41]) and area under PR curve (AUC PR
[41]) for 100 runs of Closed-Form and dGHD methods (using p-value 0.01 as cut-o)
as shown in Figure 4.
We observe from Figures 4A and 4B that the dGHD method has lower variance
w.r.t. AUC ROC and AUC PR metrics in comparison to Closed-Form approach in
the case of permuted dierential sub-network. However, in case of denser dierential
sub-network, the Closed-Form approach has much smaller variance in comparison
to dGHD algorithm w.r.t. AUC ROC and AUC PR metrics as depicted in Figure
4C and Figure 4D respectively. This suggests that the performance of Closed-Form
technique is better than dGHD method when dierential sub-networks are formed
either using permuted nodes or higher density. In order to test for significance we
performed the Student’s t-test under the null that the dierence in the mean values
of the two ROC distributions is zero i.e. µAU C ROCAµAU C ROCB= 0. At a
significance level of 5%, we obtain p-value of 0.48 in case of permuted sub-network,
thereby accepting the null i.e. the dierence between the two distributions is not
significant. In the case of paired networks with a denser dierential sub-network
(i.e. d0=0.5), we obtain p-value of 3.42 1014 for the Student’s t-test, thereby
rejecting the null. Similarly for the two PR distributions we obtained p-value of
0.42 in case of permuted sub-network and p-value of 2.64 1020 for the denser
dierential sub-network.
Mall et al. Page 12 of 23
Comparison with Community Detection techniques
The task of identifying dierential sub-networks can also be rephrased as one of
finding heavy sub-networks on a single network (say C) constructed by considering
the absolute dierence in the edge weights between the topological graph of network
A and the topological graph of network B i.e. Cij =kaij bijk,8i, j 2V.This
problem can then be construed as one of identifying dense modules in the network
C i.e. from the previous experiments we want to discover a module corresponding
to the set of nodes which have permuted or identify the denser sub-network forming
the dierential sub-network as a module.
The task of identifying dense/heavy modules in a network (C) is often referred as
community detection or graph partitioning or graph clustering. There is a plethora
of research associated with the problem of community detection including [42, 43,
44, 45, 46, 47, 48, 49]. Several of these methods such as jActiveModules [50] and
Spinglass algorithm [45] have also been applied to identify biologically meaningful
modules (like functional modules, protein complexes, disease associated genes etc.)
in biological networks as shown in [51, 52]. For our task of identifying dense modules
in network C we applied 3 dierent community detection methods namely Louvain
[43], Infomap [44] and Spinglass [45] techniques to have a comprehensive comparison
with the proposed Closed-Form approach. We used the implementation of these
methods available in the ‘igraph’ package in R and run each of these methods at
their default settings.
We used the same set of RG networks as in the previous experiments to have a
comparison with the community detection techniques. Since we are considering the
dierence in the topology of networks A and B in network C, we remove all the
similarity between the two networks and the module with the maximum internal
volume (i.e. total weight of edges within the community) is the one capturing the
maximum dierence between the topologies of networks A and B. Hence, we consider
the densest inferred module as the one comprising the dierential sub-network and
label all the nodes belonging to this cluster as dierential while all the other modules
are considered non-dierential. Using this notion to label the inferred communities,
we compare the results obtained for the 3 dierent community detection techniques
w.r.t. the gold standard (i.e. the actual set of labeled nodes which either belong
to the permuted sub-network or belong to the denser sub-network) in a binary
classification framework [53, 54]. These results are integrated in Table 2 along with
the results of dGHD technique and the proposed Closed-Form (CF) approach. We
assess the results obtained from the 3 community detection methods w.r.t. several
quality metrics commonly used for binary classification including Precision, Recall,
Kappa, Accuracy, Specificity, AUC ROC and computational time. From Table 2, we
observe that the Louvain method clearly outperforms the Infomap and Spinglass
techniques in correctly identifying the dierential sub-network as a module with
respect to the various evaluation metrics.
Simulated Result Analysis
Finally, the summary Table 2 highlights the computational eciency and better
predictive capabilities of the proposed technique in comparison to dGHD algorithm.
For this comparison, we report the results obtained on 100 random runs of RG
Mall et al. Page 13 of 23
networks with N= 1000, d=0.15 and d=0.3 respectively, where the first 100
nodes are permuted. We also report results when the first 100 nodes form the denser
dierential sub-networks i.e. in experiments where d=0.15 use d0=0.3 to form
denser sub-network and where d=0.3used0=0.5 to form denser sub-network. We
also conducted experiments on undirected Power Law (PL) graphs using N= 1000
and E= 10,000 with power law exponents ={2,3}respectively. We permuted
the first 100 nodes of each PL network (B) to form the permuted network (A).
We performed 100 random runs and report the mean values for various evaluation
metrics.
Table 2 compares the Closed-Form, Louvain, Infomap, Spin-glass and dGHD tech-
niques w.r.t. various standard evaluation metrics like AUC, Precision, Recall, Ac-
curacy, Specificity, Kappa statistic and computational Time for all the simulation
experiments. Higher values of these evaluation metrics represents better quality re-
sults. Here the time required by dGHD algorithm is normalized to 1 and the time
required by the other algorithms is scaled by the same normalization factor.
We observe from Table 2 that the Closed-Form approach performs exceedingly
well in case of experiments on denser RG networks (d=0.3) and PL graphs. It
emerges as the best method on these networks for various evaluation metrics. For
this configuration, in case of both permuted and denser dierential sub-networks,
the mean AUC ROC of Closed-Form approach is at least 10% higher than the dGHD
algorithm. This is also reflected in higher values of Precision (0.714 and 0.771) and
Recall (0.789 and 0.930) metrics for Closed-Form approach in comparison to low
values of Precision (0.645 and 0.7) and Recall (0.577 and 0.731) for the dGHD
algorithm in case of these experiments.
However, in case of sparse networks where its relatively easier to identify dif-
ferential sub-networks ([28]), both Closed-Form and dGHD method have similar
predictive performance. For sparse networks, the Louvain method nearly outper-
forms all other methods for the task of identifying the dierential sub-network as a
module. From Table 2, we observe that the 3 community detection techniques have
nearly perfect Recall scores but usually have relatively low Precision values. This
indicates that these methods correctly identify all the nodes forming the dierential
sub-network but also detect a large quantity of false-positives in the densest mod-
ule, thereby reducing the Precision values. The Louvain and Infomap methods are
extremely fast and interestingly the Louvain method has highest Precision (0.887)
which is at least 10% higher than dGHD algorithm and 5% higher than Closed-Form
approach while identifying the dense dierential sub-network in a sparse network
(d=0.15, d0=0.3) as shown in Table 2.
We observe that among the community detection techniques the Louvain method
is the most ecient and is highly competitive with the dGHD algorithm but cannot
outperform the Closed-Form approach on denser networks and Power Law graphs.
Case study in Glioma
As a case study, we performed the dierential sub-networks analysis of two gene reg-
ulatory networks re-constructed from the glioma dataset available on the TCGA.
It is well known that the majority of gliomas are divided into two main macro-
categories according to the mutation of the gene IDH1 [17, 15, 55]. Therefore,
Mall et al. Page 14 of 23
an important biological question, that motivated the development of the reported
methodology, was to identify the sub-networks of of transcription factors (TFs) hav-
ing a dierent regulatory program in these two major conditions. We re-constructed
two gene regulatory networks belonging to two dierent glioma subtypes: IDH-
mutant and IDH-wild-type as reported in the Methods Section.
In our final networks we have 457 TFs and 4,085 targets. We observe that these
networks consist of 13,683 unique connections for IDH-mutant and 14,158 for IDH-
wild-type between TF-TF and TF-target. Using these networks, we construct two
unipartite topological graphs as described in the Methods section for the 457 TFs.
We then perform the proposed dierential sub-network analysis to identify the TFs
which are part of dierential sub-networks in these topological graphs.
Figure 5 shows the significant dierential sub-networks and Table 3 reports the
topmost TFs which are part of dierential sub-networks as detected by our algo-
rithm. In this table, GHD and µrepresent the generalized Hamming Distance and
its asymptotic mean between the subgraphs after removing the specific transcrip-
tion factor in each row of the table. Supplementary Table S1 instead reports the
results for all the 457 considered transcription factors.
In order to highlight the dierence of Closed-Form approach with other standard
network analysis methods, we also assembled a global glioma network using all
the available transcriptional profiles using the same method described above and
performed a master regulator analysis [38] with respect to the molecular phenotype
under investigation, i.e. genes dierentially expressed between IDH mutant and wild
type. Master regulator analysis is extensively adopted to identify TFs that act as
principal regulators in driving the phenotype from one condition to another.
Interestingly, among the topmost TFs (out of 457) forming the dierential sub-
networks, we found several genes known to have a central role in controlling specific
glioma subtypes as well as novel candidates that deserve further biological vali-
dation. In particular, dierential network analysis reveals that the sub-network of
STAT3 is one the most dierent between IDH-mutant and IDH-wild-type networks
and a particularly significant Master Regulator of this wild-type phenotype. Mem-
bers of our group have previously shown that STAT3, together with C/EBP,is
a key regulator of the mesenchymal dierentiation and predicts the poor clinical
outcome of IDH-wild-type gliomas [38]. Another key regulator of the IDH-wild-
type gliomas was recently reported by using an integrative functional copy number
analysis is the set of HOXA genes [17]. Moreover, another key network hub that
the algorithm detects as dierent is SOX10 which appears to be an active master
regulator of the IDH-mutant phenotype. We recently reported that the GCIMP-low
subgroup in the IDH-mutant cohort can mediated by loss of CpG methylation and
binding of SOX factors [17]. Furthermore, our algorithm identifies methyl-CpG-
binding domain protein 2 (MBD2) as a dierential network hub. In particular,
MBD2 has no links in the IDH-wild-type network whereas it is highly connected in
the IDH-mutant network where it is characterized by the CpG island methylator
phenotype (GCIMP) [56]. Further investigation is needed to claim such a hypothesis
as MBD2 is known also as a mediator of the epigenetic gene regulation and its role
in Glioblastoma is being studied as its over-expression may drive tumor growth by
suppressing the anti-angiogenic activity of key tumor suppressors [57].
Mall et al. Page 15 of 23
The dierential network method highlights several other TFs as hubs of dierential
sub-networks which are not detected with standard MRA. For example, ETV1
and ETV4 which are over-expressed in gliomas of the Codel subtype carrying the
mutation of the CIC gene [58]. Another dierential sub-network hub not detected
by standard MRA is the tumor suppressor RFX1 whch has been identified as an
important target/regulator of the malignancy of Glioblastoma [59], where as the
cell cycle regulators such as E2F1 and E2F1, which play a role in progression of
IDH-mutant glioma are also detected by the Closed-Form algorithm [60].
An important warning that we want to mention is the presence of potential con-
founding eects due to the adopted dataset obtained by merging the expression
profiles from two dierent platforms. With the additional diculty that the dis-
tribution between IDH-wild-type tumors and IDH-mutated tumors is unequal be-
tween the two platforms (92% of microarray data are wild-type). We adopted this
integrated dataset in order to build the two IDH networks and the global glioma
network. The main computation in this case is the estimation of the mutual informa-
tion between pairs of gene profiles (variables) in a set of observations (patients) and
each individual pair of values is always extracted in the same platform. We used a
robust k-nearest neighbor estimator proposed in [61] available in the PARMIGENE
R package [62]. This estimator is not based on binning of values and is non para-
metric, working on the geometry of the scatterplot of each pair of gene expression
values. Therefore, each observation (sample) can be seen as another evidence of
dependency (or in-dependency) between the variables regardless to the platform.
Although, we found this merged dataset useful for the estimation of dependencies
between genes, its adoption for deriving conclusions in terms of sample groups and
pathway analysis should be made with caution.
As a further independent experiment, we performed the same analysis using the
REMBRANDT dataset with the network dierential analysis on the two networks
independently built with ARACNe and the Master Regulator Analysis on the global
network. The Table 4 reports the results for the most dierent TF sub-networks
detected by the Closed-Form algorithm on this dataset. Interestingly of the top
nine dierential nodes obtained in the TCGA dataset five (FOXJ3, NFIA, CREB1,
SOX10, KLF13) are also detected as significant in the REMBRANDT dataset sug-
gesting that these TFs have a very dierent regulatory program in glioma subtypes.
Moreover, dierently from the TCGA experiment, we observe a significant overlap
between the results of Closed-Form and that of the MRA. In particular 70 of the 75
nodes forming the dierential sub-networks are also enriched in the MRA (pvalue
of the Fisher exact test: 3.38109. However, in this case the number of significant
master regulators is considerably higher than that obtained in the TCGA case (297
vs. 144).
Conclusion
The comparison of gene expression profiles across dierent phenotypes is enabling
the discovery of novel biomarkers for prognosis or diagnosis. They hold the key to
identify novel targets for therapeutical intervention. In this paper, we proposed an
improvement to the state-of-the-art for comparing two labeled/unlabeled graphs
that are representative of two conditions (e.g. the macro-categories according to
Mall et al. Page 16 of 23
the mutation of the gene IDH1 in our case study) and identifying statistically sig-
nificant dierences in their topology. We used the centralized GHD (cGHD) metric
[28] to calculate the distance between the two labeled networks. We proposed a
Closed-Form approach, an improvement to the dGHD algorithm, to detect local-
ized topological dierences between paired networks. The Closed-Form approach
calculates the closed-form contribution of each node in the cGHD metric and ef-
ficiently removes nodes with the smaller contributions in the cGHD value. From
our experiments on scale free random geometric networks, we discovered that the
Closed-Form approach was 10-15x faster than dGHD from a computational com-
plexity point of view. For dierential sub-network analysis in very sparse paired
graphs, both the Closed-Form and dGHD methods had good predictive perfor-
mance. They reached mean AUC values of 0.935 and 0.926 respectively for 100
random runs of simulation experiments. However, for relatively denser networks,
the Closed-Form approach outperformed dGHD. The proposed method achieved a
mean AUC of 0.877 while the dGHD technique reached a mean AUC of 0.724.
The Closed-Form approach also achieved much higher Precision, Recall and Kappa
values in comparison to the dGHD method for relatively denser networks.
We applied our algorithm to detect the main dierences between the networks of
IDH-mutant and IDH-wild-type glioma tumors and show that it correctly selects
sub-networks centered on important key regulators of these two dierent subtypes.
The adopted dataset is the result of the merging of two dierent profiling platforms
and, as reported in the Results section, its use for other purposes should be made
with caution. We also report the results on the same data using standard Master
Regulator Analysis on a global network, and show the overlap between the exper-
iments. Indeed, it is known that MRA tends to have many false positives due to
correlations between TF profiles which could eventually attenuated with synergy
and shadow analysis. On the contrary, the Closed-Form algorithm for network dif-
ferencing tends to be more conservative as also suggested by the fact that only the
significantly dierent sub-networks are detected in both datasets.
List of abbreviations
AUC: Area Under the Curve
GHD: Generalized Hamming Distance
FP: False Positive
FPR: False Positive Rate
GSEA: Gene Set Enrichment Analysis
MAD: Mean Absolute Dierence
MR: Master Regulator
MRA: Master Regulator Analysis
QAP: Quadratic Assignment Procedure
RG: Random Geometric
ROC: Receiver Operating Characteristic
TF: Transcription Factor
TO: Topological Overlap
TP: True Positive
TPR: True Positive Rate
Declarations
Ethics and consent to participate
Not applicable
Consent to publish
Not applicable
Mall et al. Page 17 of 23
Competing interests
The authors declare that they have no competing interests.
Funding
This work was funded by Qatar Foundation.
Author’s contributions
RM conceived the methodology, developed the algorithms and drafted the manuscript. LC generated the data on
glioma and helped to draft the manuscript. HB performed the statistical analysis. AI participated in the design of
the study and to the critical analysis of the results. MC conceived of the study, participated in its design and
co-ordination and helped to draft the manuscript.
Acknowledgements
None
Availability of data and materials
The scripts implementing the proposed algorithms are available in R at
https://sites.google.com/site/raghvendramallmlresearcher/codes. The gene expression data used in this paper were
downloaded from the TCGA data portal https://cancergenome.nih.gov/ and form the caintegrator portal
https://caintergator.nci.nih.gov/rembrandt.
Supplementary files
File name: Table S1.xlsx
Title of data: Table S1
Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA
and Rembrandt datasets.
Author details
1QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar. 2Department of Science and Technology,
University of Sannio, Benevento, Italy. 3BioGeM, Institute of Genetic Research “Gaetano Salvatore”, Ariano Irpino
(AV), Italy. 4Department of Neurology, Department of Pathology, Institute for Cancer Genetics, Columbia
University Medical Center, New York, USA.
References
1. Jin L, Chen Y, Wang T, Hui P, Vasilakos AV. Understanding user b ehavior in online social networks: a survey.
Communications Magazine, IEEE. 2013 September;51(9):144–150.
2. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and Analysis of Online Social
Networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. IMC ’07. ACM;
2007. p. 29–42.
3. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, et al. Graph Structure in the Web.
Comput Netw. 2000;33(1-6):309–320.
4. Erath A, Lchl M, Axhausen K. Graph-Theoretical Analysis of the Swiss Road and Railway Networks Over Time.
Networks and Spatial Economics. 2009;9(3):379–400.
5. Kesidis G. An Introduction to Communication Network Analysis. Hob oken, NJ: Wiley; 2007.
6. Boginski V, Butenko S, Pardolas PM. Statistical analysis of financial networks. Computational Statistics and
Data Analysis. 2005;48(2):431–443.
7. Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovery regulartory and signalling circuits in molecular
interaction networks. Bioinformatics. 2002;18.
8. Keller A, Bakes C, Gerasch A, Kaufmann M, Kohlbacher O, Meese E, et al. A novel algorithm for detecting
dierentially regulated paths based on gene enrichment analysis. Bioinfomatics. 2009;25(21):2787–2794.
9. Nacu S, Critchley-Throne R, Lee R, Holmes S. Gene expression network analysis and applications to
immunology. Bioinformatics. 2007;23(7):850–858.
10. Dehmer M, Emmert-Streib F. Analysis of Microarray Data: a network-based appraoch. Weinheim: John Wiley
& Sons; 2008.
11. D’haeseleer P, Liang S, Somogyi R. Genetic network inference: From co-expression clustering to reverse
engineering. Bioinformatics. 2000;16(8):707–726.
12. Wallace TA, Martin DN, Ambs S. Interaction among genes, tumor biology and the environment in cancer health
disparities: examining the evidence on a national and global scale. Carcinogenesis. 2011;32(8):1107–1121.
13. Ahern TP, Horvath-Puho E, Spindler KLG, Sorensen HT, Ording AG, Erichsen R. Colorectal cancer,
comorbidity, and risk of venous thromboembolism: assessment of biological interactions in a Danish nationwide
cohort. British Journal of Cancer. 2016;114(1):96–102.
14. Ceccarelli M, Cerulo L, Santore A. De novo reconstruction of gene regulatory networks from time series data,
an approach based on formal methods. Methods. 2014 Oct;69(3):298–305.
15. Turcan S, Rohle D, Goenka A, Walsh LA, Fang F, Yilmaz E, et al. IDH1 mutation is sucient to establish the
glioma hypermethylator phenotype. Nature. 2012;483(7390):479–483.
16. Network CGAR, et al. Comprehensive, integrative genomic analysis of diuse lower-grade gliomas. N Engl J
Med. 2015;2015(372):2481–2498.
17. Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, et al. Molecular Profiling Reveals
Biologically Discrete Subsets and Pathways of Progression in Diuse Glioma. Cell. 2016 Feb;164(3):550–563.
18. Brandes U, Eriebach T. Network Analysis: Methodological Foundations. Springer. 2005;3418.
Mall et al. Page 18 of 23
19. Lena PD, Wu G, Martelli P, Casadio R, Nardini MC. An ecient tool for molecular interaction maps overlap.
BMC Bioinforma. 2013;14(1):159.
20. Yang Q, Sze S. Path matching and graph matching in biological networks. Journal of Computational Biology.
2007;14(1):56–67.
21. Ramana MV, Scheinerman ER, Ullman D. Fractional isomorphism of graphs. Discrete Mathematics.
1994;132(1):247–265.
22. Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM. Weisfeiler-Lehman Graph
Kernels. Journal of Machine Learning Research. 2011;12:2539–2561.
23. Hamming RW. The unreasonable eectiveness of mathematics. American Mathematical Monthly.
1980;87(2):81–90.
24. Butts C, Carley KM. Canonical labeling to facilitate graph comparison; 1998.
25. Gill R, Datta S, Datta S. A statistical framework for dierential network analysis from microarrya data. BMC:
Bioinformatics. 2010;11(1):95.
26. Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Research.
1967;27(2):209.
27. Hubert LJ. Assignment methods in combinatorial data analysis. Marcel Dekker. 1987;1.
28. Ruan D, Young A, Montana G. Dierential analysis of biological networks. BMC Bioinformatics. 2015;16:327.
29. Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. Weighted Gene Co-expression Network
Analysis Strategies Applied to Mouse Weight. Mammilian Genome. 2007;18(6):463–472.
30. Ha MJ, Baladandayuthapani V, Do KA. DINGO: dierential network analysis in genomics. Bioinformatics.
2015;31(21):3413–20.
31. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet
Mol Biol. 2005;4(1):1128.
32. Allen JD, Xie Y, Chen M, Girad L, Xao GH. Comparing statistical methods for constructing large scale gene
networks. PLoS ONE. 2012;7(1:e29348).
33. Deshpande R, Vandersluis B, Myers CL. Comparison of profile similarity measures for genetic interaction
networks. PLoS ONE. 2013;8(7:e68664).
34. Benjamini Y, Yekutieli D. The control of false discovery rate in multiple testing under dependency. Annals of
Statistics. 2001;29:1165–1188.
35. Johnson WE, Li C, Rabinovic A. Adjusting batch eects in microarray expression data using empirical Bayes
methods. Biostatistics. 2007;8(1):118–127.
36. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, et al. ARACNE: An Algorithm for
the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics.
2006;7(S-1).
37. Sales G, Romualdi C. parmigene - a parallel R package for mutual information estimation and gene network
reconstruction. Bioinformatics [ISMB/ECCB]. 2011;27(13):1876–1877. Available from:
http://dblp.uni-trier.de/db/journals/bioinformatics/bioinformatics27.html#SalesR11.
38. Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, et al. The transcriptional network for
mesenchymal transformation of brain tumours. Nature. 2010;463(7279):318–325.
39. Guan X, Vengoechea J, Zheng S, Sloan AE, Chen Y, Brat DJ, et al. Molecular subtypes of glioblastoma are
relevant to lower grade glioma. PLoS One. 2014;9(3):e91216.
40. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics.
2005;21(20):3940–3941.
41. Mankiewicz R. The Story of Mathematics. Princeton, NJ: Princeton University Press; 2004.
42. Girvan M, Newman ME. Community structure in social and biological networks. Proceedings of the national
academy of sciences. 2002;99(12):7821–7826.
43. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal
of statistical mechanics: theory and experiment. 2008;2008(10):P10008.
44. Rosvall M, Bergstrom CT. Multilevel compression of random walks on networks reveals hierarchical
organization in large integrated systems. PloS one. 2011;6(4):e18209.
45. Reichardt J, Bornholdt S. Statistical mechanics of community detection. Physical Review E.
2006;74(1):016110.
46. Orman GK, Labatut V. A comparison of community detection algorithms on artificial networks. In:
International Conference on Discovery Science. Springer; 2009. p. 242–256.
47. Mall R, Langone R, Suykens JA. Multilevel hierarchical kernel spectral clustering for real-life large scale
complex networks. PloS one. 2014;9(6):e99966.
48. Mall R, Langone R, Suykens JA. Kernel sp ectral clustering for big data networks. Entropy.
2013;15(5):1567–1586.
49. Mall R, Langone R, Suykens JA. Self-tuned kernel spectral clustering for large scale networks. In: Big Data,
2013 IEEE International Conference on. IEEE; 2013. p. 385–393.
50. Dittrich MT, Klau GW, Rosenwald A, Dandekar T, M¨uller T. Identifying functional modules in protein–protein
interaction networks: an integrated exact approach. Bioinformatics. 2008;24(13):i223–i231.
51. West J, Beck S, Wang X, TeschendorAE. An integrative network algorithm identifies age-associated
dierential methylation interactome hotspots targeting stem-cell dierentiation pathways. Scientific reports.
2013;3:1630.
52. Jiao Y, Widschwendter M, TeschendorAE. A systems-level integrative framework for genome-wide DNA
methylation and gene expression data identifies dierential gene expression modules under epigenetic control.
Bioinformatics. 2014;30(16):2360–2366.
53. Steinwart I, Hush D, Scovel C. A classification framework for anomaly detection. Journal of Machine Learning
Research. 2005;6(Feb):211–232.
54. Kumar A, Niculescu-Mizil A, Kavukcuoglu K, Daume III H. A binary classification framework for two-stage
multiple kernel learning. arXiv preprint arXiv:12066428. 2012;.
Mall et al. Page 19 of 23
55. Eckel-Passow JE, Lachance DH, Molinaro AM, Walsh KM, Decker PA, Sicotte H, et al. Glioma groups based
on 1p/19q, IDH, and TERT promoter mutations in tumors. New England Journal of Medicine.
2015;372(26):2499–2508.
56. Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, et al. Identification of a CpG
island methylator phenotype that defines a distinct subgroup of glioma. Cancer cell. 2010;17(5):510–522.
57. Zhu D, Hunter SB, Vertino PM, Van Meir EG. Overexpression of MBD2 in glioblastoma maintains epigenetic
silencing and inhibits the antiangiogenic function of the tumor suppressor gene BAI1. Cancer research.
2011;71(17):5859–5870.
58. Gleize V, Alentorn A, Connen de K´erillis L, Labussi`ere M, Nadaradjane AA, Mundwiller E, et al. CIC
inactivating mutations identify aggressive subset of 1p19q codeleted gliomas. Annals of neurology.
2015;78(3):355–374.
59. Feng C, Zhang Y, Yin J, Li J, Abounader R, Zuo Z. Regulatory factor X1 is a new tumor suppressive
transcription factor that acts via direct downregulation of CD44 in glioblastoma. Neuro-oncology.
2014;16(8):1078–85.
60. Bai H, Harmancı AS, Erson-Omay EZ, Li J, Co¸skun S, Simon M, et al. Integrated genomic characterization of
IDH1-mutant glioma malignant progression. Nature genetics. 2016;48(1):59–66.
61. Kraskov A, St¨ogbauer H, Grassberger P. Estimating mutual information. Physical review E. 2004;69(6):066138.
62. Sales G, Romualdi C. parmigenea parallel R package for mutual information estimation and gene network
reconstruction. Bioinformatics. 2011;27(13):1876–1877.
Mall et al. Page 20 of 23
Figures
Figure 1: Correlation between topological overlap and cosine
similarity on 250 random networks.
Figure 2: Sensitivity Analysis of Parameter .The boxplots represents
the distribution of True Positive Rate (TPR) identified by Closed-Form
approach for 100 random runs of the experiment.
Figure 3: Comparison of proposed Closed-Form approach with dGHD
algorithm. Figures A and B correspond to the ROC and PR plot for
permuted sub-network (d=0.15) respectively. Figure C and D represents the
ROC and PR plot corresponding to denser sub-network (d=0.3 and d0=0.5)
respectively. Clearly, the Closed-Form technique has better performance than
the dGHD algorithm.
Figure 4: Comparison of proposed Closed-Form approach with
dGHD method w.r.t. AUCROC and AUCPR for 100 random runs
of the experiment. These metrics are calculated using p-value 0.01 as cut-o.
Figures A and B correspond to the AUCROC and AUCPR for permuted
sub-network (d=0.15) respectively. Figures C and D represents the AUCROC
and AUCPR corresponding to denser sub-network (d=0.3 and d0=0.5)
respectively.
Figure 5: Dierential sub-networks between IDH-mutant and IDH
wild-type detected by the closed form approach. In red the connection
present only in the IDH-mutant sub-network, while in green those present
only in the IDH-wild-type sub-network. In black are represented common
connections.
Mall et al. Page 21 of 23
Tables
Table 1: Time complexity comparison Here Krepresents the number of nodes
for which p-value is greater than and generally KN. An important remark is
that the cGHD calculation after removal of each node can be done independently in
parallel. So, in case we have Tprocessors, the complexity of the proposed approach
will reduce linearly w.r.t. T.
dGHD Closed-Form
O(N2|E|)O(N|E|+Nlog(N)+K2|E|)
Table 2: Comparison of proposed Closed-Form (CF) approach with dGHD
algorithm We compared the proposed Closed-Form approach with dGHD, Lou-
vain, Infomap and Spinglass techniques w.r.t. various evaluation metrics for random
geometric (RG) and power law (PL) networks. Bold represents the best results.
Parameters Method AUC ROC Precision Recall Accuracy Specificity Kappa Time
Mean ±Sd Mean ±Sd Mean ±Sd Mean ±Sd Mean ±Sd Mean ±Sd Mean
d=0.15 (RG) CF 0.935 ±0.051 0.849 ±0.037 0.846 ±0.102 0.969 ±0.011 0.983 ±0.004 0.828 ±0.068 0.078
d=0.15 (RG) dGHD 0.926 ±0.018 0.793 ±0.021 0.878 ±0.036 0.965 ±0.005 0.974 ±0.003 0.813 ±0.026 1.0
d=0.15 (RG) Louvain 0.980 ±0.016 0.767 ±0.052 1.0 ±0.0 0.965 ±0.028 0.960 ±0.031 0.841 ±0.1130.012
d=0.15 (RG) Infomap 0.843 ±0.012 0.262 ±0.015 1.0 ±0.0 0.718 ±0.022 0.685 ±0.024 0.304 ±0.024 0.018
d=0.15 (RG) Spinglass 0.832 ±0.011 0.249 ±0.012 1.0 ±0.0 0.699 ±0.018 0.665 ±0.021 0.285 ±0.020 0.85
d=0.15,d0=0.3CF 0.927 ±0.048 0.839 ±0.031 0.862 ±0.098 0.969 ±0.008 0.982 ±0.005 0.825 ±0.054 0.081
d=0.15,d0=0.3dGHD 0.922 ±0.022 0.806 ±0.027 0.868 ±0.045 0.966 ±0.006 0.977 ±0.004 0.816 ±0.032 1.0
d=0.15,d0=0.3Louvain 0.978 ±0.018 0.887 ±0.137 0.974 ±0.042 0.982 ±0.018 0.982 ±0.023 0.916 ±0.083 0.013
d=0.15,d0=0.3Infomap 0.849 ±0.008 0.269 ±0.009 1.0 ±0.0 0.728 ±0.015 0.698 ±0.016 0.316 ±0.016 0.020
d=0.15,d0=0.3Spinglass 0.859 ±0.009 0.284 ±0.013 1.0 ±0.0 0.747 ±0.016 0.719 ±0.017 0.339 ±0.019 0.92
d=0.3(RG) CF 0.877 ±0.067 0.714 ±0.0750.789 ±0.135 0.947 ±0.016 0.975 ±0.011 0.716 ±0.099 0.083
d=0.3(RG) dGHD 0.724 ±0.029 0.645 ±0.049 0.577 ±0.059 0.921 ±0.007 0.971 ±0.006 0.504 ±0.051 1.0
d=0.3(RG) Louvain 0.866 ±0.019 0.406 ±0.061 1.0 ±0.0 0.850 ±0.034 0.833 ±0.038 0.505 ±0.072 0.013
d=0.3(RG) Infomap 0.677 ±0.011 0.147 ±0.004 1.0 ±0.0 0.419 ±0.019 0.354 ±0.022 0.100 ±0.008 0.021
d=0.3(RG) Spinglass 0.678 ±0.011 0.148 ±0.004 1.0 ±0.0 0.420 ±0.018 0.355 ±0.021 0.100 ±0.008 0.90
d=0.3,d0=0.5CF 0.979 ±0.005 0.771 ±0.0610.930 ±0.0820.965 ±0.0120.969 ±0.011 0.821 ±0.062 0.09
d=0.3,d0=0.5dGHD 0.848 ±0.071 0.700 ±0.038 0.731 ±0.148 0.941 ±0.010 0.964 ±0.009 0.672 ±0.078 1.0
d=0.3,d0=0.5Louvain 0.932 ±0.029 0.478 ±0.118 1.0 ±0.0 0.879 ±0.054 0.866 ±0.059 0.582 ±0.128 0.014
d=0.3,d0=0.5Infomap 0.674 ±0.010 0.145 ±0.004 1.0 ±0.0 0.413 ±0.018 0.348 ±0.020 0.097 ±0.008 0.023
d=0.3,d0=0.5Spinglass 0.711 ±0.007 0.162 ±0.003 1.0 ±0.0 0.481 ±0.013 0.423 ±0.014 0.128 ±0.006 0.94
=2(PL) CF 0.797 ±0.046 0.307 ±0.3070.792 ±0.099 0.801 ±0.018 0.349 ±0.051 0.802 ±0.022 0.09
=2(PL) dGHD 0.797 ±0.013 0.294 ±0.009 0.809 ±0.027 0.787 ±0.008 0.333 ±0.015 0.784 ±0.009 1.0
=2(PL) Louvain 0.780 ±0.014 0.212 ±0.010 1.0 ±0.0 0.703 ±0.018 0.272 ±0.016 0.690 ±0.011 0.015
=2(PL) Infomap 0.665 ±0.013 0.141 ±0.004 1.0 ±0.0 0.603 ±0.018 0.162 ±0.012 0.484 ±0.019 0.026
=2(PL) Spinglass 0.687 ±0.014 0.153 ±0.006 1.0 ±0.0 0.645 ±0.021 0.194 ±0.011 0.527 ±0.016 0.90
=3(PL) CF 0.825 ±0.019 0.345 ±0.0150.825 ±0.035 0.826 ±0.007 0.402 ±0.024 0.826 ±0.004 0.085
=3(PL) dGHD 0.808 ±0.027 0.327 ±0.018 0.799 ±0.050 0.816 ±0.008 0.375 ±0.031 0.817 ±0.004 1.0
=3(PL) Louvain 0.774 ±0.015 0.233 ±0.011 1.0 ±0.0 0.736 ±0.019 0.301 ±0.009 0.732 ±0.019 0.015
=3(PL) Infomap 0.670 ±0.014 0.168 ±0.005 1.0 ±0.0 0.635 ±0.017 0.210 ±0.014 0.532 ±0.014 0.027
=3(PL) Spinglass 0.694 ±0.013 0.179 ±0.007 1.0 ±0.0 0.670 ±0.023 0.232 ±0.012 0.571 ±0.017 0.94
Mall et al. Page 22 of 23
TF Z-score GHD µMRA fdr
FOXD3 0.000 1.000 1.000 1.000E+00
FOXJ3 0.000 1.000 1.000 8.442E-03
MLX 0.000 1.000 1.000 8.075E-01
NFIA 0.000 1.000 1.000 4.502E-01
ETV1 0.062 0.058 0.058 1.000E+00
E2F1 0.085 0.058 0.058 1.007E-01
CREB1 0.208 0.058 0.058 8.580E-01
SOX10 0.234 0.058 0.058 8.442E-03
KLF13 0.338 1.000 0.278 1.240E-02
STAT3 0.354 0.058 0.058 8.442E-03
RUNX3 0.387 0.058 0.059 1.671E-02
IRF3 0.406 0.840 0.455 8.442E-03
ZNF354C 0.498 0.058 0.057 1.000E+00
HOXD13 0.540 0.059 0.059 2.492E-01
ZIC1 0.622 0.058 0.058 5.787E-02
HOXA2 0.700 0.059 0.059 1.405E-01
FOXO1 0.743 0.058 0.058 8.183E-02
MAFG 0.817 0.862 0.467 6.857E-01
RFX1 0.865 0.059 0.059 3.131E-01
NR1H2 0.871 0.058 0.058 8.176E-01
PAX 6 1. 0 0 3 0 . 0 5 8 0 .057 4. 1 4 7 E - 0 1
GLIS2 1.035 0.058 0.059 8.442E-03
NR4A2 1.118 0.058 0.058 1.000E+00
STAT4 1.137 0.848 0.486 9.615E-01
DLX6 1.208 0.058 0.059 1.000E+00
SIX4 1.232 0.058 0.058 1.000E+00
MEF2D 1.379 0.058 0.059 8.442E-03
MTF1 1.388 0.058 0.057 1.000E+00
MBD2 1.480 0.820 0.495 1.969E-01
OTP 1.493 0.058 0.057 2.970E-01
ETV4 1.529 0.059 0.059 2.122E-01
ZBTB12 1.566 0.194 0.189 4.255E-02
HOXB4 1.595 0.058 0.057 3.019E-01
PLAG1 1.622 0.195 0.190 3.434E-01
E2F6 1.668 0.197 0.192 8.442E-03
CREM 1.674 0.765 0.506 2.122E-01
IRF9 1.700 0.058 0.057 5.950E-02
KLF6 1.709 0.059 0.059 8.442E-03
TFE3 1.716 0.199 0.193 1.049E-01
HSF2 1.759 0.201 0.195 1.671E-02
NR2C1 1.800 0.058 0.058 2.122E-01
ONECUT2 1.804 0.202 0.196 3.657E-02
HOXD3 1.847 0.204 0.198 1.000E+00
BACH1 1.888 0.058 0.059 2.897E-01
GSX1 1.895 0.207 0.200 1.000E+00
HOXA13 1.930 0.058 0.057 1.000E+00
VAX2 1.937 0.208 0.201 1.609E-01
Table 3: The top most dierent transcription factors detected between
IDH-mutant and IDH-wildtype in the TCGA dataset. The columns re-
ports the dierential measures in terms of Z-score of the proposed dierencing test
(equation (2)), the GHD computed between the two networks, the mean of the null
GHD distribution. The last column reports the False Discovery Rate of the GSEA
enrichment obtained with a Master Regulator Analysis.
Mall et al. Page 23 of 23
TF Z-score GHD µMRA fdr
MGA 0.000 1.000 1.000 2.166E-03
TEAD1 0.000 1.000 1.000 8.017E-04
FOS 0.000 1.000 1.000 5.137E-04
JUNB 0.000 1.000 1.000 5.137E-04
MEF2C 0.015 0.015 0.015 8.001E-04
LEF1 0.058 0.014 0.014 5.137E-04
NEUROD2 0.096 0.016 0.016 1.221E-03
EGR2 0.110 0.013 0.013 6.263E-03
JUN 0.123 0.333 0.500 5.137E-04
ARX 0.144 0.012 0.012 9.301E-02
BBX 0.173 0.012 0.012 7.333E-04
TCF3 0.198 0.011 0.011 5.137E-04
LHX6 0.205 0.017 0.017 8.492E-04
EGR1 0.211 0.011 0.011 9.696E-03
BCL6B 0.214 0.011 0.011 5.137E-04
E2F2 0.217 0.011 0.011 7.786E-04
E2F7 0.220 0.012 0.012 5.137E-04
E2F8 0.223 0.012 0.012 5.137E-04
ELF4 0.226 0.012 0.012 5.137E-04
ETV5 0.229 0.013 0.012 5.137E-04
FLI1 0.232 0.013 0.013 5.137E-04
FOXG1 0.235 0.013 0.013 1.000E+00
HOXD9 0.239 0.014 0.014 9.728E-04
ID4 0.242 0.014 0.014 7.786E-04
IRF8 0.246 0.014 0.014 2.393E-02
MYBL2 0.250 0.015 0.015 5.137E-04
NFIA 0.254 0.015 0.015 4.085E-03
NFIB 0.258 0.016 0.016 7.796E-04
KLF13 0.258 0.360 0.515 8.001E-04
OLIG2 0.262 0.016 0.016 7.893E-02
PROX1 0.266 0.017 0.017 1.020E-02
SOX2 0.270 0.017 0.017 2.995E-03
TEF 0.275 0.018 0.018 8.221E-04
ZBTB7A 0.280 0.019 0.018 7.700E-04
ZIC1 0.284 0.019 0.019 7.700E-01
SOX13 0.295 0.021 0.020 8.086E-04
TCF7L2 0.300 0.021 0.021 7.487E-04
BCL6 0.305 0.022 0.022 5.137E-04
MAF 0.317 0.024 0.024 5.137E-04
CEBPB 0.330 0.024 0.024 5.137E-04
CEBPD 0.337 0.025 0.025 5.137E-04
HLF 0.344 0.018 0.018 3.029E-03
ELK1 0.349 0.025 0.025 8.017E-04
FOXJ3 0.369 0.027 0.026 5.137E-04
MTF1 0.377 0.028 0.027 5.137E-04
TP53 0.388 0.028 0.028 5.137E-04
GABPA 0.407 0.030 0.029 5.137E-04
CDC5L 0.417 0.031 0.031 7.899E-04
RORA 0.422 0.329 0.467 7.796E-04
IRF9 0.426 0.031 0.031 3.062E-03
STAT1 0.437 0.033 0.032 5.137E-04
CREB1 0.456 0.035 0.034 5.137E-04
SOX10 0.462 0.036 0.035 8.250E-04
HOXD1 0.475 0.038 0.037 5.137E-04
SOX8 0.479 0.038 0.037 1.760E-03
HOXD11 0.480 0.047 0.046 2.975E-02
NR2F2 0.490 0.042 0.041 5.186E-04
DLX1 0.491 0.046 0.045 7.700E-04
TCF12 0.493 0.040 0.040 9.117E-04
THRB 0.495 0.051 0.050 9.850E-04
DLX2 0.496 0.045 0.044 8.492E-04
HOXD10 0.498 0.050 0.049 5.137E-04
ATF5 0.505 0.057 0.055 5.137E-04
STAT4 0.515 0.055 0.054 9.220E-04
TBR1 0.519 0.020 0.020 9.272E-04
MESP1 0.521 0.092 0.087 8.746E-04
POU3F2 0.523 0.063 0.061 5.137E-04
TFEC 0.530 0.082 0.079 5.137E-04
TCF4 0.533 0.071 0.069 7.487E-04
ETS2 0.543 0.176 0.163 9.728E-04
CREM 0.558 0.110 0.104 5.140E-04
TP63 0.561 0.105 0.099 9.220E-04
STAT6 0.563 0.091 0.087 5.137E-04
NPAS2 0.575 0.136 0.127 1.889E-01
GLI3 0.601 0.313 0.455 4.663E-02
Table 4: The top most dierent transcription factors detected between
IDH-mutant and IDH-wildtype in the REMBRANDT dataset. The
columns reports the dierential measures in terms of Z-score of the proposed dier-
encing test (equation (2)), the GHD computed between the two networks, the mean
of the null GHD distribution. The last column reports the False Discovery Rate of
the GSEA enrichment obtained with a Master Regulator Analysis.
0.058 0.060 0.062 0.064 0.066 0.068
0.070 0.075 0.080
Cosine Measure
TO Measure
0.7
0.8
0.9
1.0
116 162 208 254 300 346 392 438 484 530 576 634 691
Absolute log of thresholds θ
True positive rate
Experiment
Dense Permuted Subnet
Sparse Permuted Subnet
A B
C D
ClosedForm dGHD
0.0 0.2 0.4 0.6 0.8 1.0
AUC_ROC
ClosedForm dGHD
0.0 0.2 0.4 0.6 0.8 1.0
AUC_PR
ClosedForm dGHD
0.4 0.5 0.6 0.7 0.8 0.9 1.0
AUC_ROC
ClosedForm dGHD
0.0 0.2 0.4 0.6 0.8 1.0
AUC_PR
p = 0.48 p = 0.42
p = 2.64 x 10-20
p = 3.42 x 10-14
False positive rate
True positive rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
ClosedForm
dGHD
Recall
Precision
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
ClosedForm
dGHD
False positive rate
True positive rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
ClosedForm
dGHD
A B
C
Recall
Precision
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
ClosedForm
dGHD
D
RAD54L
CDC20
CDC6
RNASEH2A
ORC1L
MND1
TRIP13
UBE2C
ZNF107
PRKRIR
CAPRIN1
NEDD1
DBF4
MAP2
CD55
RAD51AP1
CTCF
SMEK2
ZNF624
BRD4
PHF8
C19orf29
PRR14
THOC7
ZFHX4
CASP2
TYK2
RAVER1
CDC25A
PCNA
RECQL4
POLD1
TUBB
POLA2
C9orf100 C19orf57
CDKN3
EXO1
KIF18A
RFX2
LIX1
SPAST
KCTD10
NFIX
PTBP1
ARNT
HIC2
WDR90
GNG4 ARNT2
MYBL2
ESPL1
CTXN3
EZH2
ORC6L
CLSPN
NCAPG2
MCM10
PBXIP1
PLEK2
OTX2
EBF3
NKX3-2 OSR2
KNTC1
SALL3
ECT2
C5orf32
PPP4R2
RRP1B
VEZF1
RSPH1
ADD3
EXPH5
PSD2
STON2 HEPACAM
HPSE2
NR3C2
DBX2
LIMCH1
SNTA1
GRIK1
MKX
SLC15A2
BBS1
NR2E1
BBOX1
BMPR1B
POLR3H
MED29
SSBP2
TMEM47
REV1 STAG1
SP4
C10orf18
PMS1
AKR1C3
LMNB2
ZBTB39
RBM15B
NFATC3
ADNP
LIX1L
ZBED4
DVL2
FGD1
GEMIN4
NOTCH1
ZNF677
IL17D
TTPA
FBLIM1
FGFR3
NTSR2
RAVER2
GJA1
ABCD2
RXRG
ZIC5
TJP2
BCL7A
SS18L1
FGF1
ARVCF
CDT1
TOX3
ITGB4
CAPN1
PRAF2
EMD
AHR TNFRSF1A
LRP10
MAP3K5
SORBS1
CYP27B1
RAB34
KIAA0182
PRDX6
REEP1
HCN3
RNF165
HOXA1
PSME3
C15orf42
HOXA3
BUB1
DEPDC1B CYorf15A
HOTAIR
CKAP2
CCNB2
SPC25
BIRC5
TEX10
KIFC1
MKRN3
LMNB1
C18orf54
CDCA8
ADD2
RGL3
RGMB
SHD
SQSTM1
PICALM
SP100
TPP1
NFE2L1
DSN1
RACGAP1
LOC81691
TK1
PKN3
WDR62
FANCI
ZWINT ACADSB
KIAA0101
MCM6
CCNB1
MCM5
WHSC1
CHTF18
NFKBIL2
CHAF1A
GINS1
CDCA5
OXR1
MCM4
KIF22
KIF14
POLE
GINS4
C16orf75VRK1
CCNF
MCM2
TRAIP
C5orf34
MASTL
NCAPH
PRC1
CENPH
DHFR
FANCA
TYMS
CKS1B
C1orf135
STIL
WDR76
TROAP
C21orf58
DTL
FOXD2
C12orf48
DDX39 CCNE1
FEN1 CENPK
HMGB3
CENPN
GINS2
PBK
RFC3
CENPO
CENPF
PKMYT1
MCM7
ZNF367
C16orf59
ESCO2
CDC7
CDCA2
STAT6
C6orf15
RNF122
KIF15
SP3
MEX3A
RIMS3
FOSL2
NFKB1
MSN
DLX2
VEPH1
IL1RAP
VIPR2
OLIG2
BCOR
OLIG1
SOX4
CDC25C
CKAP2L
FANCE
TRIB2
TRIM24
TCF12
MKI67
CSPG5
TPX2
KLF12
TRIM9
UBE2T
NOVA1
MEOX2
HOXD11
FAM131B
WSCD1
HDX
ARSJ
DCX
TMEM169
MEX3B
ZNF300
DHX40
PPARA
IBSP
SBK1
MAP1LC3C
NDST4
POLR3B
DLX1
CRABP1
BCL11A
ASXL3 CAMKV
CCK
SLC32A1
FGF13
EPHB1
GABRA1 PRMT8
TAC3
SOX11
IRX5
HOXC10 STIM1
PAX3
HOXD9
TCF4
SGOL1
ZNF286A
TOP2A
KIAA1549
HOXD4
BNIP2
CEND1
FNDC3B
MAPRE3
NRIP1
AMN1
PI4K2B SEC24D
ZNF678
CKB TCF3
CREB3L2
WWTR1
MED1
MOBKL1B
KLF9
CHD4
TP53
KLF15
GTF3C2
CHD7
AMOT
MYO5C
MGAT4C
C5orf33
AGXT2L1
SLC7A11
RYR1
USP19
ARID2
HEATR1
WRN
KHSRP
TSPYL3
ATP5H
BRD3
ZNF426
ZNF91
ADCY8GLI2
MASP1
AHCYL1
HEATR6
PPP2R5A
MAML2 TGFBR1
BAT2
IFNGR2
ACOT8
CD14
CD276
ELK3
FDX1L
ARID3A SIRT5
NEU3
PYGO1 ETV3
TRRAP
MFSD9
CEP68
KLHL11
UBR2
PFN2
FAM120C
C6orf134
ZFP36L2 RNGTT
CACNG4 FCHSD2
TFCP2
SPIRE2
SOX6
PATZ1
HDAC2
ZNF703
MAZ
RCOR2
ACCN4
C1orf106
NME1
CD109
NPPA
HAND2
MAPKAP1
SLC39A1
FLII
ADAM9
IFI16
WDR1
NFKB2
RELB
RUNX1
FLNA
GLIS3
SOAT1
SEPN1
TLN1
CACYBP
EIF4EBP1
KCNMB2
FGD4
HEY1
ATP6V1B1
ETV6
NUP35
C5AR1
STX11
DAB2 BMF
LAIR1
TNFRSF1B
DMPK
TNFAIP8
HLA-DPA1
STXBP3
ARPC5
TMEM101
CYB5R4
ARPC4
COMMD5
TMEM59
KNDC1
VPS13D
LYZ
NKG7
SIGLEC7 MIF
MFSD1
ABI3
ALOX5
S100A4
MAN2B1
TMEM173
VAMP8
CMTM4
CD300LF
MS4A6A
RAPGEF2
MSR1
SNRPA
GLRX3
SFT2D2
GCNT2
PODXL2
MRPS23
SCP2
GRPEL2
TRPC3
UBIAD1
KCNK3
SLC35E3
ADPRHL2
ZNF507
PSMC4
HP1BP3
STX12
CAMK4
CNN3 AFF4
NOTCH2
TTC4
ZMPSTE24
EPB41
CHGB
LYPLA2
KCNC1
DOCK7
SH3GLB1
SLC30A7
WDR78
TMED5
HNRNPU
RSBN1
OSBP2
LRRC40
TRIP4
RHOC
NOC2L
SF3A3
GNAI3
TAF12
USP48
FAM40A
EIF2C3
SLC38A1
WDR8
ZNF146
C19orf2
YIPF5
BTF3L4
FOXN3
INVS
RAB31HOXA11
SNX7
ACADMHNRNPR
FBL PRPF38B
LAYN
HOXC13
NOG
CDC42
SLC25A33
CCDC88A
ATAD3A
STK38
GPR125
C1orf109
PPT1
PLCE1 RAB11FIP2
ZHX2
TNNT1
ABTB2
MADCAM1
TRIM23
PPP1R14C
VCAM1
GRAMD1A
GRAMD1B
FARP1
PDIK1L
PTPRZ1
NCF1
RASSF2
SERBP1
CMTM7
ARG2
SASH1
ARHGEF9
NRBF2
AP2A1
HMGN2
SCMH1
GDAP1
PGD
AKR7A2
LRRTM1
LYL1
EBNA1BP2
PGAP1
SLC2A5
TJP1
SPRN
PIH1D1
TMEM147
KIF26B CHRNB2
PARK7
MLL
NIPBL
VIM
NCAN
MED25
RABAC1
RIF1 ATP5F1
psiTPTE22
MYST3
L1CAM
PSD
NSD1 NCOA6
ZNF281
SRRM2
HCST
FAM134A
C12orf43
PTGER4
PTOV1
ATCAY
CACHD1
PRPF31
SHC3
KIAA1409
PPP5C
ABCB8 NCAM1
RGAG4
SYT14
ECD
BDP1
PTBP2
ETHE1
TP53I13
BCL9
B9D2
PNKPPSENEN
RIPK2
CREBBP
CDC42SE1
RAD54L2
SOX2
ARHGEF10L
RPS8
DCLK2
TTYH1
KHDRBS1
DDAH1 USP1
SRM
ZNF335
CHST9
ATXN2
SLC41A1
SEZ6L2
GBX2
SCAMP1
TYROBP
KYNU
NPAS3
RIT1
KCNIP2
ZNF181
ZNF302
TTC9C
ATP5J2
ARHGDIB
MRPL33
MPG
TSPO
PRDX3
ARL11
RAB32
USP47 COMTD1
TSPAN17
ORMDL2
LST1
SPTAN1
C17orf37
MRPL13
MYCBP2
TOM1L2
COMMD1
BNIP1
METTL1
SSR4
GOLT1B
SRA1
STOX2
CAMK2G
C11orf59
ARHGEF12
RNF181
CD300A
ABLIM1
CTNND2
SNX11VKORC1
C10orf125
C1orf83
PRDX4
CWF19L1
DYNLT1
GNGT2
ZNF511
UROS
BAK1SCD
MED28 AP2S1
SERF2ARHGAP5
LY96
LY86
ATP6V0E1
TRIP10
BCAT2
HMGCL
HTATSF1
SEZ6
SIX5
AKR1A1
SNAP91
MYO10
STK3
TPT1
HDAC1
ZNF32
UBE2J2
SEP15
WHSC1L1
CDH13
MARCH4
TRAPPC3
FNBP1L
FXC1
LSM10
PPIH
APC
GTF2B
CCDC23
PDE8A
HECTD3
TGOLN2 HBXIP
MED8
MACF1
CCDC97
UROD
GPATCH3
LSM14A
MRPS15 CEP290
NRD1
CPT2
MYCBP
SCML1
FCGRT
C1QB
AFF1
NARS2
PSMB3
STX8
E2F3
SAMD4B
RUVBL2
C1orf91
NOSIP
LRRC8D
RWDD3
AMZ1
CEPT1
PLCL2
NFKBIBU2AF2
DGKI
CTSH
UTP11L
AURKAIP1
NAT14
ACACA
MEIS3
RPAP2
SLC8A3
MAGOH
NRXN2
CITED4
REEP4
FLT3LG
MRPL20
RIMS2
DBP
ABCA2
KCNH7
ARHGEF1
GPRIN1
RAD23A
GABRB3
BCL2L12
TMEM50A
XKR4
CD3EAP
OMA1
CARD8
CECR6
DEF6
CSNK1G3
ZNF766
UNC13A
SEZ6L
SCAF1
PPIE
PEPD
TOMM40
GNAL
ZNF691
AK2
RCN3
SIX1 GPBP1L1
CCDC130
SERTAD3
PRKD2 NAPA
CAMK2N2
CC2D1A
CEP70
AKT1S1
SHROOM2
MMP24
PDCD5
ZNF580
NLGN3
CCDC123
GRB10
FBXL7
C1orf174
PTCH2
SOX9
SMC1A
ZNF720
TM2D1 RPE65
DNTTIP2
ALMS1
CAPNS1
TMEM181
MAPK8IP2
CELSR3
GNG5
ADCK4
SYF2
PEF1
MRPL37
GABRA3
NUCB1
RBM42
SHKBP1
ECH1
ATRX
SFRS4
NMNAT1
C19orf61
HCFC1
PEX14
SSU72
CACNG2
KDELR1
ANK1
TRIM3
OSBPL9
RER1
CAPZA1
TGFBI
CAPZB
WASF2
HCN4
TXNDC12
ZDHHC22
GRIN3A
DMAP1
PRDM2
CCDC106
DNALI1
MED13L
PSMA5 SLC9A7
ZBTB48
DDX58
RSAD2
HERC6BATF2
IFIT3
B2M OASL
PLSCR1
TRIM21
PSME2
PARP9
TM9SF1
PARP12
TAP1
IRF7
MX1
IFI6
IFI35
DTX3L
HERC5
BTN3A3
IFITM1
IFIT2
ISG15
SP110
EPSTI1
OAS1
PSMB9
STAT1
RNF31
LGALS3BP
EGR3
FOSB
ZIC1
NR4A1
JUNB
ZIC4
NR4A3
APOL6
BST2 IFIT1
IL18BP
USP18
CPLX1
NRGN
CCDC64
HOOK1
STS
SYT12
ADAT3
GALNT5
FBXO41 TIMM50
UPF2
MRPL12 RCC1
TBKBP1MYOD1
SEMA6B
SYN1
MAD2L2
CACNA2D3
AURKA
ZSCAN16
HN1
VSNL1
EMX1
CTSC
ERC2
PMF1
LRFN1
POU3F1
NEFM
MAL2
KCNS1
RAB37
ADCYAP1
ACSL4
TCERG1L
MAP1LC3A
CNTN4
CRYM
CARTPT
OCIAD2
KCNAB1VIP
SH3BGRL2
AGL
ACOT4
GLT1D1
NUS1
BTBD10
CABP1
C10orf140
ANKRD24
ARHGAP15
KCNV1
PART1
RHEBL1
ADAMTS7
GJC1
ZNF784
SF3A2
MEX3D
MCAT
SIRT6
SCARF2
MOSPD3
GATA4
ATP2B3
KCTD8
HPRT1
EID3
BET1L
KCNC2
LZTS1
C2orf39
SYN2
TSPYL2
DLX5
MDH1
RUNDC3A
ZNF410
VTA1
ZNF184TBP
COQ3
ZNF263
RBM11
RAB40C
ITPKA
PVALB
STEAP2
PIP5K1C
N4BP3 ITFG1
GLS2
MEPE
BSCL2
RNF216
SLC6A7
TPD52L1
C9orf91
ELMOD1
STXBP4
THY1
MTRF1L
C6orf170
INTS7
LRCH2 TCP1
KPNA5
TRMT11
UST
RALYL RGS14
BBS7
CPNE8
CXCL14
CD99L2
KIAA0748
SLITRK4
NLGN2
CCNC
TBPL1
PRIC285
ZNF775
TTK
TSPYL4
NFYA
PCGF2
A2BP1
HCN1
NAGPA
PLEKHA1
FAM81A
TMEM155
PPP3CA
CHM
KIAA0284
FAM65A
POP5
SCN8A
NUDT1
SNPH
ABCA5
DUSP8
PCLO
PI4KA
ZCCHC12
PREPL
SYT5
CPEB3
CBLN2
BTBD9
SERINC3
CHRM3
MKL2
EPHB6
XK
PRRT1
CREG2
C6orf27
CALM3
MFSD4
SH2D5
PAK6
PCDHGC5
SPRYD3
DOCK3
NCDN
KCNS2
AAK1
PPP1R3F
RGS7BP
KIAA0513
MCTP1
THRB
KIF3C NPM2
ST8SIA3 FBXO34
EIF2B1
RYR2
NRIP3
H3F3A
KCNAB2
RFXANK
BAI2
NBEA
DUSP2
SV2B
PGM2L1
EPB49
CHD5
KIAA1107 SEC61B
CLSTN1
DDN
DLG4
ZBTB7A
ITPR1
NAPB
SLC30A3
GTF3C6
DLGAP2
STXBP1
C19orf25
MRPS14
CNTNAP1 NDUFB7
PLXNA1
TTC3
C2orf7
GOLGA2
NDUFA9
YEATS4
FBXO4 PMS2L5
SH3BP5
NFAT5
SF1
ZNF627
ZNF684
CCDC6
RASAL2
IGF2R
CDV3
C6orf125
PCNXL2
MNT
NSL1
SLC27A5 CHCHD5 GOLGA3
MAP1B
WDR37
BICD2
APOM
ABL2
PLOD3
BANF1
DIP2C
SBNO1
PLEKHM1
SAR1B
SMARCA4
PPP4C
PPRC1
ACYP2
UBN1
TIMM9
DCI
RAB35
C19orf42
RAB11FIP5
GADD45GIP1
GFOD1
CBX7
NUP54
PIGF
SFT2D3
MAP1A
UTP18
METTL2A
WDR12
EDEM2
VPS29C1orf43
GSTO1
CNIH4
POLR2K
USP22
ETV5
EMG1
ALKBH2
IFI44L
BOLA3
PDPK1
DLEU1
ZFAND6
MRPL15
C9orf45
RNF14
MKI67IP
SRGN
GIMAP5
C12orf41
CAMK1G
PACSIN1
RAB20
DGUOK
HRH2
PTPRRIDS
CNN2
PRRT2
GRIN2A
SLC4A10
CHIC1
CDKL5
DLGAP3
DNAJC5 G3BP2 KALRN
LMO7
CAP2
NPTX1
BRMS1
KIAA1045
PRKCE
CACNB1
CLCN4
MAP3K9
RAB3A
ZC3HAV1
NLRC5
SLC12A5
CMIP
NDEL1
CAMTA2
KCNJ4
FRMPD4 ATP1A3
ATP2B1
CCDC92 CLSTN3
SYNGR1
CACNA1B
CCKBR
ADAM11 PPP3CB
SYT7
GSK3A DVL3
DNAJC16
NUMBL
SFRS7
FDXR
GPR173
ZNF514
PAK4 SPHK2
SEC63
USP34
DVL1
HKDC1
MRPL48
MRTO4
ITPKC
CMBL
C19orf12
SFRS11
FBXO44
PUM1
ZNF594 RAB3GAP2
NCOA1 EDA2R
ZMYM4
NOL11
HOXA10
IFT20
GABPA
HOXA9
CREB3L4
POMGNT1
MYOZ1
B4GALT2
SEPT8
GALE
LRRC47
RPS19
TANC2
PGBD5
DCLRE1B
ZNF776
FBXW2
RPL22L1 NBR1
OR4N2
KIAA0090
RGS7
HS2ST1
C1orf128
YBX1
ZNF436
C1orf27
RNASEN
FAF1
CYB561D1
MSX2
KIAA0430
WDR3 DYRK1B
TGFA
ICMT
SLMO2
MIB2
STOX1
TMEM38B
SLAIN1
CDC14A
RPL11
C1orf212
LIFR
RAB3C
UBE4B
KREMEN2
GNL2
GNB1
DDX20
MAGI3
HS3ST1
DNAJC11
RNF19B
TRIM33
ZNF606
ETS2
PRPF38A
DRG2
UTY
HMP19
PTP4A2
DDOST
NPHP4
YTHDF2
RNF11
IGFBP6
ZNF407
C22orf36
PHTF1
LRRC42
ZNF438
PTPRN
RCC2
HPD
EID2
YIF1B
DHDDS
MFN2
ST3GAL3
PHC2
PHF13
SNAPC2
ZCCHC11
ZDHHC18
CTTNBP2NL
TMEM69
SCN7A
RPL18
NADK
PNRC2
KIAA0319L
BSDC1
ZBTB17
NECAP2
TXLNA
NRAS
KIAA2013
PHACTR4
TAF13
ZZZ3
FAM54B
SNIP1
MTF2
EXOSC10
PKN2
ABCC8
DBT
LEFTY2
ZNF644
PABPC4
RLF PSMB2
SYT4
PRKAR2B
CAP1
EXTL2
EVI5
KPNA6
NIT2
SLC23A2
KLHL21
ELOVL4
PPP1R8
PLEKHM2
STK40
RBBP4
SDF4
TM9SF3
TUB
RPS9
PGLS
C12orf35
RIMS4
ZNF35
RAG1AP1
CBL
KCNJ11
LHX5
THAP3 SLC35D1
FKRP
GPATCH1
OTUD7B
RRM2B
PTPN11
NAV2
KCTD1
RPS5
SUCLG1
SMG5
PTPRF
MTHFR
ATAD1
RERE
LZIC
SLC4A2
EIF4G3
POGZ
NUDT2
PSMD8
YARS
INPP5B
AQP6 SARS
TRIT1
ZCCHC17
AKAP13
ATP6V0A1
VDAC3
CDK5R2
WDTC1
C19orf55
RTCD1
EP400
BMP7
LRRC41 CLCC1
TCEB3
KIAA0562
ZNF629
PDE6D
RRAGC
CAMK2A
NSUN4
ZYG11B
ABHD10
POP4
TNRC6B TNRC6A
BSN
SPATA13
KIAA0467
TXNDC17
C17orf48
FAM96A
MAPKBP1
GRLF1
MICAL3
UHMK1
NDUFB5
SCNM1
NDUFA12
PSMA2
SPTBN1
SOCS7
SBF1
TMEM128
MRPL16
C1orf53
CELSR2
HIPK2
C11orf73
NFU1
C7orf44
ADK
PPP1R12B
TRIAP1
UQCRQ
PPA2
SFMBT2
FAU
C7orf30
TIA1
C1orf31
C1orf50
POLR2G
RAPGEF1
SUPT4H1
MAN1A2
SPEN
ZKSCAN1
MYH3
HERC2
SNRPG
KIF1A
APBA1
C7orf55
MRPL54
COMMD9
PSMD14
MRPL39
RNF111
SLC30A4
NDUFS3
TACC1
RPL27
SNRPD2
ANKRD11
TMEM85
TCEAL6
TBC1D9B
UBE2Q2
KIAA0355
UBE2N
SLC16A5
CD2
SETX
H6PD
C1orf123
PCDH1
C6orf129
MRPL52
OTUD3
C12orf51
TBC1D7
TMEM126B
CARD11
PAK3
ERICH1
WDR61
RPS14
TGFB1
KIAA1109
GOSR1
GPR132
HADH
MRPL44
IL2RB
KIAA0494
BCL7C
ANKRD39
GOLGA4
GPX7
TBL1X
DICER1
CSF1
GDAP2
PSMB8
NDUFS4
C8orf76
CDC42BPA
PFDN5
UBL5
WDFY3
COMMD7
CAMSAP1
MRPL11
C14orf179
C14orf142
ADRBK2
C7orf36
SEPX1
LMTK2
RANBP2
NDUFA13
HUWE1
C17orf42
FIS1
MEMO1
ZBTB43
DUSP18
ASH1L
CIAPIN1
C1orf57
EIF2C1
BBX
NUCKS1
FPGT
PACS2
RPS15A
MRPS7
GOLIM4
MED31
EXOSC4
COMMD10
MRPL46
LARP1
SDF2
SYNJ1
ICK
PGCP
C14orf147
BCORL1
C5orf28
KIF1B
TIMM8B
UBE2B
C21orf88
ZNFX1
DUSP11
CDK8
DRG1
MALAT1
AR
DHX38
BZW1
EXOSC8
GBF1
UBE2O
ATP6V0B RFC1
TBC1D10C
CNOT8
CD48
ESF1
CABIN1
MED27 ASXL2
PPFIA2
DNAJC8TPR
GTF3A
ANKRD17
DPH5
EIF3I
MAP7D1
WDR77
CPSF3L
SEC22B
HMG20B
ZBTB7B
SRRM1
C1orf144
ABCD3
ARID1A
USP33
DR1FBXO42
SCG2
THRAP3
GFRA1
SAFB2
MIER1 S100PBP
HIPK1
CSDE1
ZNF574
FKBP14
EFS
ARL10
WNK3
P2RX7
RNASEH1
SLC30A6
TNR
IL1RAPL1
ZNF318
ZNF41
C20orf94
PDPR ZNF454
CMTM1
ZC3H7B
ZNF778
FUT11
MYST4
H2AFJ
NPAT
C20orf117
USP49
RBL1
DNAJC18
KIRREL
LCOR
RSPRY1
GMCL1
ANAPC1
PHF2
ZNF462
CXXC4
GON4L
C1orf151
NF1
QSER1
BRWD3
ZBTB37
CCDC9LLGL2
MADD
BCKDHA
LHX2
FHL3
KLF4
TBC1D17
ZNF787
SDHB
ZNF430
SLC12A6
NDUFA1
MAPK8ZNF689
MLL5
ADAM10
TSSK4
ZNF486
MAML1
RSF1
ZNF192
ZNF510
KIF2A
ZNF619
CHD8
SERINC5
ZDHHC20
MYEOV2
KPNB1
MARCH8
C17orf79
NLGN1
ZBED3
C1orf58
ZMIZ1
GABRQ
ZNF791
SHPRH
C11orf30
ZSCAN29
JOSD2
DYNC1I1
BRPF3 MORN1
OSBPL8 DEDD2
RAI1
GSK3B
ARL5B
ADAM22
MPHOSPH9
USMG5
C16orf61
SMCR7L
SLCO5A1
EPC1
KLHL20
MON1B
ZNF221
ALG10
SMURF2
PANK3
GCN1L1
SEC24A
ZBTB10
IPMK
UTP14C
LRCH3
BAZ1B
FAM123B
LATS1
RBM41
CCDC28A
NUP155
POLR1B
ZNF660
C9orf102
KIAA0947
LMBRD2
KBTBD7
HIP1
PFAS
LDOC1L
MED13
CCNT1
SON
RAPGEF6
ZNF516 FLOT1
RC3H2
RBM8A
ZFP91ZNF79
CCDC72
ERN1
ZNF121
RAB39
CECR2
CRAMP1L
ZNF142 MBD5
RASA2
CSNK2A1POU3F3
DHX8
GRID2
LRP1
SAMD8
SETD1A
C14orf118
TTC1
FER
CSNK2A1P
PCDH17
SOS1
ZBTB26
ATF7IP
C14orf156
N4BP2
ZNF704
PHF6
ZNF573
EGFR KLHL23
TSR1
ZNF391
FBXW8
ATF6
ZNF623
GTF3C4
KIF13A
PRKDC
TUBGCP4
DHX33
ZNF687
HMGCR
OXER1
RFNG
ZNF446
PCDHGC4
HECW1
VASH2
SLC7A14
PDZD8
FAM123C
MAF
PAF1
TGFBRAP1
ATP2A2
CHD6
VRK3
GABRG2
HLA-DMB
GNAO1
HRH3
HLA-DRA
PROS1
KIAA1543
RAP1B
SLC8A2
BAI3
SLC1A2
SQRDL
CASP8
CSMD1
VSTM2A
SHANK2
RHOQ
BRSK2
ATRN
AKAP11
BZRAP1
IQSEC3
SRGAP3
IGFBP7
DNAJC4
C1QA
SPTB
PTPRT EPHA10
ATP8A2
TMEM149
C1QC
FCER1G
CASKIN1
KCNT1 ZSCAN18
SGEF
ALK
ESRRG
RGS6
SPRY2
SPRY4 GNG7
SLC4A4
SPRED2
GPR3
RAC2
FYB
TLR8 HLA-DOA
CYBB DOK2
TRAF3IP3
GBP5
LHFPL2
FOS
MYADM
GADD45B
ATF3
ZFP36
MMACHC
ADM
MAFF
MCL1
SLC2A3
LIF
HMOX1
PRDM1 SOCS3
SERPINE1
NFIL3
FAM26F
ADPGK
ARHGAP30
FAM78A NOD2
FASTKD3
LCP1 ZNRD1
NDUFA2
IL2RG
ELF4
MPEG1
OSM
CLEC2B
CSTA CTSS
RAB11FIP1
COL4A5
SLC44A1
HHATL
PIP4K2A
MAG
CA10
PDIA2
DBNDD2
EDIL3
POPDC3
BAMBISLC1A1
ATP6V0A4 C11orf9
AFAP1L2
MBP
UGT8
PLEKHH1
FGFR2HHIP TMEM125
DNAH17
SHROOM4
CLDN11
ATP10B ARHGAP22
SLC31A2
SH3GL3 DNM3
RNF144A
ST18
AMOTL2
KCNK12
RAB33A
TMCC2
BCAS1
TMEFF2WASF1
KIF13B
PVRL1
PHACTR3
RAB40B
SLC45A3
AATK
GJB1
PDE1C
RNF125
KCNJ2TMC6
NINJ2
MOG
ERBB3
ENPP2
LIMS2
SGCD
PADI2
ARNTL
SNX22
TBC1D12
RAP2A
DSCAML1
GALNT13
LGI3
CLMN
TMEM144
CC2D1B
HAVCR2
GBP2 RASGRP4
C3orf54
RHOG
ADAM28
AMPD3
SYNGR2
PAIP2B
PEX5LTUBB4
C12orf34
FAM38B
FAAH
FRMD4B
OSTF1
GPR17
PLD1
SLC5A11
RFFL
BOK
PRAM1
KIF21B
ADAMTS4
CNTNAP4
SEC14L5
GPR37
FA2H DPYSL5
WWP2
RASGRF1
LIPE
GPR62
GREM1
MYOT
C9orf125 LDB3
CLCA4
PLLP
LGR5 PPP1R16B
SORT1
CNTN2
ANKS1B
GNAI1
SIRT2
PLP1
DOCK5
ADCY5
NR0B1
CDH19
NACAD
FCRLA
MOBP
RASGEF1C
MAP7
CNP ZEB2
TF
TMC7
KLK6 APOD
GAL3ST1
FAM89A
ZNF488
NKD1
FOLH1
RPL39
TMSB10
IL10RB
MAN2C1
ATPAF2
PPP1CA
POLR1C
VAV3
CENPT
ODZ4
DUSP4
AIFM3
MRPL49
C2orf28
LSM1
VPS25
NEU1 CD300C
HS6ST3
NUP62
EPB41L1
IMMP1L
HERC1
TDRKH
METTL3
NSUN6
CDK9
TIAL1
CCNL1
TSEN54
LY6G5B
KLRA1
ANKS3
ATAD3B
CDK3
C16orf79
AGER
CRIP3
NAT9
CCDC45
DMTF1
RBM39
AFG3L1
ANKZF1
LAT
RBM6
SFI1
PILRB
HEXDC
LRDD
C1orf104
MTERFD3
ZNF334
ARIH2 DDX26B
MAMDC4
FNBP4
SCNN1D
C17orf56
ORAOV1
SEC31B
ITIH4
DNASE1L2
ORMDL1
SFRS6
METT11D1
LUC7L
SPPL2B
DFNB59 PABPN1
ZNF337
CPT1B SFRS8
ZNF682CREBZF
AHSA2
NPFF
CCDC84
RBM5SFRS18
MDM4
PDXDC2
C11orf61
THOC1
UPF3A
AMY2B
ATG16L2
CSAD
FAM64A
RHOT2
ZNF83
CLK1
TARBP1
SFRS2
CDK5RAP3
TGM1
WSB1
ANKRD36
QTRT1
TRPV1
GOLGA8A
CLK4
PPWD1 C8ORFK29
CCDC76
ZNF692
ZNF767
PRPF39
WDR27
TUBE1
ZNF789
DOK3
PTPN7
C3AR1
DENND1C
CCR5 GMIP
MFNG
LTB
GPR65
FXYD5
IL10RA
LRRC25
VAMP2
HOXB7
CLASP2
ELAVL3
GPR82
HOXB3
TPPP HLA-DMA
ATG4C
BIN1
RPS6KA1 AGTPBP1
CEBPA
MLXIPL
SELPLG
PHACTR1
NAP1L2
ACTL6B
CD4
CASP4
ACVR1B LPHN1
HLA-DPB1
SH3BGRL3
VAMP3
RAB13
NRXN1
WBP2
UGDH
PPM1E
CSDA
CACNA2D2
S100A11
HSPA6
CLIC1
RIPK3
NPTXR
C9orf4
SHANK1
SLC6A1
GIMAP8
DENND2D
ITGA4
IFI30
LILRB3
RFX1
CREB1
HOXA2
GSX1
OTP
E2F1
HOXD13
HOXD3
ETV1
TFE3 ONECUT2
FOXO1
ZBTB12
PAX6
BACH1
STAT3
VAX2
DLX6
HOXB4
RUNX3
NFIA
NR1H2
MLX
IRF3
NR4A2
IRF9
GLIS2
STAT4
HSF2
MEF2D
MAFG
E2F6
HOXA13
FOXJ3
FOXD3
CREM
KLF13
MTF1
ZNF354C
MBD2
ETV4
KLF6
SOX10
NR2C1
SIX4
PLAG1
... The problem of detecting significant changes in paired biological networks is different from popular graph theory problems like graph isomorphism [46] and sub-graph matching [51] for which various graph matching and graph similarity algorithms [5,30] exist and have been utilized in biological networks [55,45]. This problem has primarily been addressed either in a statistical framework [37,21,50,33] or from a community detection perspective [33,10,54,23,14,32] in literature. ...
... The problem of detecting significant changes in paired biological networks is different from popular graph theory problems like graph isomorphism [46] and sub-graph matching [51] for which various graph matching and graph similarity algorithms [5,30] exist and have been utilized in biological networks [55,45]. This problem has primarily been addressed either in a statistical framework [37,21,50,33] or from a community detection perspective [33,10,54,23,14,32] in literature. ...
... This is unlike previous differential network analysis techniques [15,14,17] and generate p-values by comparing the remaining subnetworks. Recently, a Closed-Form approach was proposed in [33] which is faster and more accurate than the dGHD technique for identifying statistically significant changes between paired networks as differential sub-networks. However, these statistical techniques are still computationally expensive and suffer from strict restrictions on the exponent of power-law for scale-free graphs. ...