Content uploaded by Raghvendra Mall

Author content

All content in this area was uploaded by Raghvendra Mall on Feb 22, 2017

Content may be subject to copyright.

Content uploaded by Raghvendra Mall

Author content

All content in this area was uploaded by Raghvendra Mall on Dec 25, 2016

Content may be subject to copyright.

Mall et al.

METHODOLOGY

Detection of statistically signiﬁcant network

changes in complex biological networks

Raghvendra Mall1rmall@qf.org.qa , Luigi Cerulo2,3lcerulo@unisannio.it , Halima

Bensmail1hbensmail@qf.org.qa , Antonio Iavarone4ai2102@cumc.columbia.edu and Michele

Ceccarelli1*mceccarelli@gmail.com

*Correspondence:

mceccarelli@gmail.com

1QCRI - Qatar Computing

Research Institute, HBKU, Doha,

Qatar

Full list of author information is

available at the end of the article

Abstract

Background: Biological networks contribute e↵ectively to unveil the complex

structure of molecular interactions and to discover driver genes especially in

cancer context. It can happen that due to gene mutations, as for example when

cancer progresses, the gene expression network undergoes some amount of

localized re-wiring. The ability to detect statistical relevant changes in the

interaction patterns induced by the progression of the disease can lead to the

discovery of novel relevant signatures. Several procedures have been recently

proposed to detect sub-network di↵erences in pairwise labeled weighted networks.

Results: In this paper, we propose an improvement over the state-of-the-art

based on the Generalized Hamming Distance adopted for evaluating the

topological di↵erence between two networks and estimating its statistical

signiﬁcance. The proposed procedure exploits a more e↵ective model selection

criteria to generate p-values for statistical signiﬁcance and is more eﬃcient in

terms of computational time and prediction accuracy than literature methods.

Moreover, the structure of the proposed algorithm allows for a faster parallelized

implementation. In the case of dense random geometric networks the proposed

approach is 10-15x faster and achieves 5-10% higher AUC, Precision/Recall, and

Kappa value than the state-of-the-art. We also report the application of the

method to dissect the di↵erence between the regulatory networks of IDH-mutant

versus IDH-wild-type glioma cancer. In such a case our method is able to identify

some recently reported master regulators as well as novel important candidates.

Conclusions: We show that our network di↵erencing procedure can e↵ectively

and eﬃciently detect statistical signiﬁcant network re-wirings in di↵erent

conditions. When applied to detect the main di↵erences between the networks of

IDH-mutant and IDH-wild-type glioma tumors, it correctly selects sub-networks

centered on important key regulators of these two di↵erent subtypes. In addition,

its application highlights several novel candidates that cannot be detected by

standard single network-based approaches.

Keywords: Di↵erential Networks; Gene Regulatory Network Inference; Master

Regulators

Background

The omni-presence of complex networks is reﬂected in wide variety of domains

including social networks [1, 2], web graphs [3], road graphs [4], communication

networks [5], ﬁnancial networks [6] and biological networks [7, 8, 9]. Although we

focus on biological networks many aspects of the method proposed in this paper can

Mall et al. Page 2 of 23

also be applied for networks in other contexts. In cancer research, the comparison

between gene regulatory networks, protein interaction networks, and DNA methy-

lation networks is performed to detect di↵erences between two conditions, such as,

healthy and disease [10, 11]. This can lead to discovery biological pathways related

to the disease condition, and, in case of cancer, the gene regulatory changes as the

disease progresses [12, 13, 14].

A central problem in cell biology is to model functional networks underlying in-

teractions between molecular entities from high throughput data. One of the main

question is how the cell globally changes its behavior in response to external stimuli

or what is the e↵ect of alterations such as, driver somatic mutations and changes

in copy number. Signatures of di↵erentially expressed and/or methylated genes are

the downstream e↵ect of global cell de-regulation in di↵erent conditions such as

cancer subtypes. Therefore, it is argued that driver mutations activate functional

pathways described by di↵erent global re-wiring of the underlying gene regulatory

network.

The identiﬁcation of signiﬁcant changes induced by the presence or the progression

of disease can help to discover novel molecular diagnostics and prognostic signatures.

For example, it is known that, according to the mutation of the gene IDH [15, 16],

the majority of malignant brain tumors can be divided two main macro-categories,

which can be further divided in seven molecular and clinically distinct subtypes [17].

These two macro-groups are characterized by highly di↵erent global expression and

epigenomic proﬁles. Hence, one of the main questions to understand the molecular

basis of diseases is how to identify signiﬁcant changes in the regulatory structure in

di↵erent conditions.

Various techniques have been developed to compare two graphs including graph

matching and graph similarity algorithms [18, 19, 20]. However, the problem ad-

dressed in this paper is di↵erent from popular graph theory problems including

graph isomorphism [21] and sub-graph matching [22]. Here the goal is to identify

statistically signiﬁcant di↵erences between two weighted networks (with or without

labels).

One common statistic used to distinguish one graph, Afrom another B,hav-

ing the same number of nodes N, is the Mean Absolute Di↵erence (MAD) met-

ric, deﬁned as: d(A, B)= 1

N(N1) Pi6=j|aij bij |,whereaij and bij are edge

weights corresponding to the topology of networks Aand B. This distance mea-

sure is equivalent to the Hamming distance [23] and has been extensively used

in literature to compare networks [24, 25]. Another statistic used to test associa-

tion between networks is the Quadratic Assignment Procedure (QAP) deﬁned as:

Q(A, B)= 1

N(N1) Pi=1 Pj=1 aij bij . The QAP metric is used in a permutation-

based procedure to di↵erentiate two networks [26, 27]. Ruan et al. showed that these

metrics are not always sensitive to subtle topological variations [28].

Our aim is to detect statistically signiﬁcant di↵erences between two networks un-

der the premise that any true topological di↵erence between the two networks would

involve only a small set of edges when compared to all the edges in the network.

Recently, a Generalized Hamming Distance (GHD) based method was introduced

to measure the distance between two labeled graphs [28], where it was shown that

the GHD statistic is more robust than MAD and QAP metrics for identifying sub-

tle variations in the topology of paired networks. In particular the authors showed

Mall et al. Page 3 of 23

that GHD permutation distribution follows a normal distribution with closed-form

expression for ﬁrst two moments under the null hypothesis that networks Aand B

are independent. Utilizing the moments, corresponding p-values were obtained in

closed-form. They also propose a di↵erential sub-network identiﬁcation technique

namely dGHD. The advantage of this technique is that – unlike previous di↵er-

ential network analysis techniques [25, 29, 30] – it provides a closed-form solution

for p-values for the di↵erential sub-network left after iterative removal of the least

di↵erential nodes. We propose an improvement over dGHD, namely Closed-Form

approach that exploits the conditions for asymptotic normality which is computa-

tionally cheaper and attains better prediction performance than the dGHD algo-

rithm. Computational eﬃciency and prediction accuracy is crucial in cancer con-

texts where networks have a large number of nodes and the topological di↵erence

is associated to few driver genes.

Methods

Preliminaries on Generalized Hamming Distance

The Generalized Hamming Distance is a way to estimate the distance between two

weighted graphs [28]. Let A=(V, EA) and B=(V, EB) be two graphs, with the

same set of nodes V={1,...,N}, and di↵erent sets of edges, EAand EB.The

Generalized Hamming Distance (GHD) is deﬁned as:

GHD(A,B) = 1

N(N 1) X

i,j,i6=j

(a0

ij b0

ij)2,(1)

where a0

ij and b0

ij are mean centered edge-weights deﬁned as:

a0

ij =aij 1

N(N1) X

i,j,i6=j

aij ,b

0

ij =bij 1

N(N1) X

i,j,i6=j

bij

The edge weights, aij and bij, depend on the topology of the network and pro-

vide a measure of connectivity between every pair of nodes iand jin Aand B.

Di↵erent metrics have been adopted to measure the connectivity between pairs of

nodes, including: topological overlap (TO) [31, 32], cosine similarity and Pearson

correlation [33]. In our experiments, we used the cosine similarity to capture ﬁrst

order interactions between the nodes in the network. Cosine similarity computation

scales well for large sparse networks and can be used in place of TO, as it has nearly

perfect correlation with it.

Given two networks Aand B, a permutation ⇡of the labels of the vertices of

A(keeping the edges unchanged) generates a permuted network A⇡. The quantity

GHD⇡(A⇡,B) represents the test statistics of an inferential problem having as null

hypothesis Ho:Graphs Aand Bare independent [28]. The distribution of GHD⇡

can be obtained through an exhaustive calculation which can be approximated by

a Monte Carlo approach. The authors of [28], indeed, simpliﬁed this calculation

showing that under the null hypothesis it can be approximated well by a normal

distribution with moments that can be obtained analytically.

Mall et al. Page 4 of 23

This can be shown as:

GHD(A⇡,B) µ⇡

⇡

⇠N(0,1) (2)

where µ⇡is the asymptotic value of the mean GHD statistic and ⇡is the asymp-

totic value of the standard deviation of GHD statistic computed between A⇡and

B. In order to calculate the µ⇡and ⇡values we deﬁne:

St

a=

N

X

i=1

N

X

j=1,j6=i

at

ij ,t=1,2 and Ta=

N

X

i=1

(

N

X

j=1,j6=i

aij )2

St

b=

N

X

i=1

N

X

j=1,j6=i

bt

ij ,t=1,2 and Tb=

N

X

i=1

(

N

X

j=1,j6=i

bij )2

Here at

ij and bt

ij are the edge weights with the power t. Furthermore, we require the

following terms:

Aa=(S1

a)2,B

a=Ta(S2

a) and Ca=A

a+ 2(S2

a)4Ta

Ab=(S1

b)2,B

b=Tb(S2

b) and Cb=A

b+ 2(S2

b)4Tb

Using these deﬁnitions the closed-form expression for mean µ⇡and variance 2

⇡

are expressed as:

µ⇡=S2

a+S2

b

N(N1) 2(S1

a)(S1

b)

N2(N1)2,

2

⇡=4

N3(N1)3[2(S2

a)(S2

b)+4(Ba)(Bb)

N2+

(Ca)(Cb)

(N2)(N3) (Aa)(Ab)

N(N1)]

(3)

Given a signiﬁcance threshold ↵(e.g. 0.01), p-values >↵indicate that there is

no suﬃcient evidence to reject the null hypothesis (Ho) that graphs Aand Bare

independent. Hence, higher p-values indicate more probability that the two graphs

under consideration are independent.

Di↵erential sub-network detection with GHD

The GHD distance is able to tell us to what extent are two graphs di↵erent but is

not able to identify which parts of the graph are similar and which are di↵erent. In

this work, we are interested in detecting which part of the graphs contribute to make

the two graphs di↵erent. We call such di↵erent sub-graphs di↵erential sub-networks.

The notion of di↵erential sub-networks is based on the idea that when comparing

two networks only a subset of edges would present altered interaction. The goal

is to identify the set of nodes, namely V⇤, associated with such a subset of edges

and the p-values p⇤corresponding to the nodes in V⇤. This goal, formulated as a

statistical test, requires that for such a subset V⇤there is no suﬃcient evidence

to reject the null hypothesis that the corresponding sub-networks A⇤(V⇤,E

A⇤) and

B⇤(V⇤,E

B⇤) are statistically independent.

Mall et al. Page 5 of 23

The idea here is to adopt an iterative technique to identify the set of nodes

V⇤which contributes more to the di↵erence. We start from the dGHD algorithm

proposed in [28]. The algorithm measures the edge connectivity with topological

overlap metric and beneﬁts from the closed-form solution of p-value (Equations (3)).

In the dGHD algorithm, an iterative procedure is followed where at each iteration

the change in centralized GHD (cGHD) i.e. cGHD = GHD(A, B)µ⇡is estimated

after the removal of one node. The node where the change in cGHD (i.e. di↵erence

in cGHD before and after removal of a node) is maximum is removed. The GHD

statistic is computed for remaining sub-networks and the p-value is estimated. This

process is repeated till a user speciﬁed minimal set size is reached or it is no-longer

possible to have closed-form representation for p-values which happens for N3

as shown in equation 3. The p-values are then adjusted for multiple testing by

controlling the false discovery rate [34].

The dGHD algorithm su↵ers from the following limitations: a) During the ith

iteration, the GHD measure is calculated Nitimes on di↵erent sub-graphs with

an overall time complexity ⇠O(N2⇥|E|)whereE=EA[EB; b) The algorithm is

prone to discovery more false positives since it uses the change in cGHD (cGHD) as

a model selection criterion. We overcome such limitations by proposing the following

improvements:

1Remove nodes by exploiting the Closed-Form. We use the idea that nodes

which have similar topology in networks Aand Bwill contribute the least

to cGHD. So, we ﬁrst calculate the closed-form contribution of each node in

cGHD once using equation 4 and then iteratively remove nodes with least

contributions. However, this process is continued till we observe that the p-

value of the remaining sub-network becomes greater than a threshold ✓.

2Using a di↵erent model selection criterion. Once the p-value reaches ✓,we

follow a procedure similar to the dGHD algorithm but use the more intuitive

criterion of selecting the node that when removed makes the cGHD value

maximum rather than using the change in the cGHD value (before and after

removal of a node) as a model selection criterion. By using this model selection

criterion, we iteratively identify and remove that node whose contribution is

least in the cGHD.

The advantage of the Closed-Form approach is that we signiﬁcantly reduce the

computational complexity and improve the predictive performance. A simple

alternative to the Closed-Form approach would be to sort all the nodes based

on their contribution to cGHD and thus rank all the nodes based on their ca-

pability to di↵erentiate the two networks with complexity (O(Nlog N)). How-

ever, then we will not be able to identify statistically di↵erent sub-networks

between the two graphs as indicated in [28].

Closed-Form Approach

We propose a fast approach to perform di↵erential sub-network analysis taking into

consideration the contribution of each node to GHD and µ⇡. Using equations (1)

Mall et al. Page 6 of 23

and (3) this can mathematically be represented as:

GHD(A, B)(i)= 1

N(N1)(

N

X

j=1,j6=i

(a0

ij )2+

N

X

j=1,j6=i

(b0

ij )2

N

X

j=1,j6=i

(2a0

ij ⇥b0

ij ))

µ⇡(i)=(PN

j=1,j6=i(aij )2+PN

j=1,j6=i(bij )2)

N(N1) 2(PN

j=1,j6=iaij )(S1

b)

N2(N1)2

2(PN

j=1,j6=ibij )(S1

a)

N2(N1)2+2(PN

j=1,j6=iaij )(PN

k=1,k6=ibik )

N2(N1)2

(4)

We observe that if we sum GHD(A, B)(i) and µ⇡(i)8i2V, we obtain GHD(A, B)

and µ⇡. We use the idea that nodes which have similar topology in networks Aand

Bwill contribute the least to centralized GHD, i.e. GHD(A, B)µ⇡. We calculate

the Closed-Form contribution of each node in the centralized GHD (cGHD) once

using equation (4) and then iteratively remove nodes with least contribution to the

cGHD, i.e. nodes having similar topology in graphs Aand B. Thus, we calculate

cGHD once and sort all the nodes based on their contribution to the cGHD metric.

This process is continued till we observe that the p-value of the remaining sub-

network becomes greater than a threshold ✓. Once the p-value reaches ✓, we estimate

VK= GHD(A(VK,E

A),B(VK,E

B)) µVKwhere µVKis the mean of the permu-

tation distribution for the nodes (VK) of the remaining sub-network. Furthermore,

we deﬁne VK|ias the value of cGHD after removal of node i. We adopt a di↵er-

ent model selection criterion than that proposed in [28] to remove non-di↵erential

nodes. We use the intuitive criterion of selecting that node after removal of which

the cGHD value becomes maximum, i.e. the node which was most similar in terms

of topology for the paired-graphs. Finally, the obtained p-values are adjusted for

multiple testing by controlling the false discovery rate [34]. Provided the paired-

graphs Aand B, the calculation of VK|ican be done independently for each i.

Details of the Closed-Form method is provided in Algorithm 1. The sensitivity of

the Closed-Form approach with the parameter ✓is demonstrated in Experimental

Results section. Table 1 summarizes the improvements with respect to the dGHD

algorithm in terms of time complexity.

Alternative Procedure (Fast Approximation)

We propose an alternative procedure to the Closed-Form approach namely the Fast

Approximation method where we ﬁrst calculate the cGHD value without including

the ith node, 8i2Vonce. This helps to estimate the cGHD value after removal of

the ith node and can be performed in parallel. Our aim is to quickly discard those

nodes after removal of which the cGHD value becomes large thereby removing

nodes which were contributing least to the cGHD value. This helps to reduce the

dependence between the two sub-networks by removing nodes which have similar

topology in graphs Aand B. Again, the idea is motivated by the premise that only

a subset of nodes will form the di↵erential sub-networks in graph Aand B.

Mall et al. Page 7 of 23

Algorithm 1: Closed-Form

Data:Graphs Aand Bwith Nvertices V.

Result:Subset V⇤representing the set of nodes which comprise the di↵erential sub-network & p-values for

GHD measure.

V⇤={} // Empty Set for differential sub-network nodes.

VK=V// Initialize a copy of the set of vertices V.

p⇤={} // Empty Set for p-values.

Calculate contribution of each node iin centralized GHD using equation 4.

Sort all nodes based on their contribution in ascending order and keep in O.

while N>3do

z=GHD(A(VK,EA),B(VK,EB))µVK

VK

.

Calculate p-value using zand append p-value to p⇤.

if p-value >✓then

VK={} forall the i⇢VKdo

t=(GHD(A(VK|i,E

A),B(VK|i,E

A)) µVK|i).

Add tto VK// Perform in parallel.

n⇤=maxiVK

// Select that node after removal of which cGHD becomes maximum.

Remove node n⇤from VKi.e VK=VK\n⇤and O=O\n⇤

else if p-value <✓then

n⇤=mini(O)// Select node in the sub-network with least contribution.

Remove node n⇤from O.

// Ois sorted so remove 1st node.

if p-value >0.01 then

Append n⇤to V⇤.

N=N1.

Adjust the p-values for false-discovery rate [34].

In this approach, we iteratively discard those nodes after removal of which the

cGHD value becomes maximal till the p-value for the remaining sub-network reaches

a threshold ✓. Once the p-value reaches ✓, we return back to the procedure of

estimating VK|i8i2VKas described in the Closed-Form approach. We use the

same model selection criterion of selecting that node after removal of which the

cGHD value becomes maximum as used in the Closed-Form approach. We then

adjust the obtained p-values for multiple testing by controlling the false discovery

rate [34]. We refer to this technique as a Fast Approximation to the dGHD [28]. We

explain the Fast Approximation technique in detail in Algorithm 2.

From our experiments, we observe that the results of the Closed-Form approach

and the Fast Approximation technique are identical. Although, in the case of Closed-

Form approach, we calculate closed-form contribution of each node in the cGHD

value and remove the node with least contribution, while in case of Fast Approxi-

mation we select that node after removal of which cGHD value becomes maximum,

the ordered list Oobtained for both the methods is identical. Moreover, the com-

putational complexity of the Fast-Approximation technique is the same as that of

Closed-Form approach.

Inference of the Glioma networks and Master Regulator Analysis

We used the TCGA pan-glioma samples dataset including 1250 samples (463 IDH-

mutant and 653 IDH-wild-type), 583 of which proﬁled with Agilent microarray

and 667 with RNA-Seq Illumina HiSeq (REF) downloaded from the TCGA portal.

The batch e↵ects between the two platform were corrected using the COMBAT

algorithm [35]. The ﬁnal gene expression data matrix includes 12,985 genes and

1250 samples. We re-constructed two gene regulatory networks belonging to two

Mall et al. Page 8 of 23

Algorithm 2: Fast Approximation

Data:Graphs Aand Bwith Nvertices V.

Result:Subset V⇤representing the set of nodes which comprise the di↵erential sub-network & p-values for

GHD measure.

V⇤={} // Empty Set for differential sub-network nodes.

VK=V// Initialize a copy of the set of vertices V.

p⇤={} // Empty Set for p-values.

VK={} forall the i⇢VKdo

t=GHD(A(VK|i,E

A),B(VK|i,E

A)) µVK|i.

// Estimate cGHD value after removal of node i.

Add tto VK.// Perform in parallel.

Sort VKin descending order and keep in O.

while N>3do

z=GHD(A(VK,EA),B(VK,EB))µVK

VK

.

Calculate p-value using zand append p-value to p⇤.

if p-value >✓then

VK={} forall the i⇢VKdo

t=(GHD(A(VK|i,E

A),B(VK|i,E

A)) µVK|i).

Add tto VK// Perform in parallel.

n⇤=maxiVK

// Select that node after removal of which cGHD becomes maximum.

Remove node n⇤from VKand O

else if p-value <✓then

n⇤=maxi(O)// Select node in the sub-network with least contribution.

Remove node n⇤from O.

if p-value >0.01 then

Append n⇤to V⇤.

N=N1

Adjust the p-values for false-discovery rate.

di↵erent glioma subtypes: IDH-mutant and IDH-wild-type. Both networks were re-

constructed with a four step procedure that follows ARACNe [36]: i) Computation

of mutual information between gene expression proﬁles to determine interaction

between Transcription Factors (TFs) and target genes [37]; ii) data processing in-

equality to ﬁlter out indirect relationships [36], iii) permutation test with 1,000

re-samplings to keep only statistically signiﬁcant relationships. We also assembled

a global glioma network using all the available 1250 transcriptional proﬁles using the

aforementioned method. In this last case we also used intersection with transcription

factor (TF) binding sites to keep only relationships due to promoter binding. We

used a set of 457 TF binding sites available in the MotifDB Bioconductor package.

Master Regulator Analysis (MRA) algorithm [38] was applied to the global glioma

network in order to compute the statistical signiﬁcance of the overlap between

the regulon of each TF (i.e. its ARACNe inferred targets) and the di↵erentially

expressed gene list (Wilcoxon-Mann-Whitney test FDR 0.05) between IDH-

mutant and IDH-wild-type samples. Given a gene interaction network, generated

by ARACNe and a gene phenotype signature (e.g. a set of di↵erentially expressed

genes), the MRA algorithm computes for each TF the enrichment of the phenotype

signature in the regulon of that TF. The regulon of a TF is deﬁned as its neighbor-

hood in the gene interaction network. There are two di↵erent methods to evaluate

the enrichment of the signature in the regulon. One method uses the statistical

Fishers exact test, while the other approach uses Gene Set Enrichment Analysis

(GSEA). Here we used this last method.

A Master Regulator (MR) gene is a TF which regulon exhibit a statistical signif-

icant enrichment of the given phenotype signature.

Mall et al. Page 9 of 23

Validation in the Rembrandt dataset

We used an independent dataset to perform the same analysis of network di↵er-

encing between IDH-mutant and IDH-wild-type gliomas and check the the overlap

between the two analyses. Raw gene expression (A↵ymetrix U133 Plus 2.0) from the

publically available Repository for Molecular Brain Neoplasia Data (Rembrandt)

(https://caintegrator.nci.nih.gov/rembrandt/) included 444 samples divided in 218

Glioblastoma, 148 Astrocytoma, 67 Oligodendrogliomas and 11 mixed histologies.

Expression subtype and IDH status was inferred from gene expression following the

procedure in [39] resulting in 153 wild-type and 162 mutant samples. These two set

of expression proﬁles were used to generate two regulatory networks using the same

approach reported above.

Results and Discussion

For all our experiments, we used the Closed-Form approach (since results obtained

from Closed-Form and Fast-Approximation techniques are identical) and compare

it with the dGHD method [28].

Cosine similarity and topological overlap

The one-step topological overlap measure used to estimate the edge weights is de-

ﬁned as:

aij =Pl6=i,j AilAlj +Aij

min(Pl6=iAil Aij ,Pl6=jAlj Aij )+1 (5)

In this work we use the cosine similarity to calculate the edge weights aij.The

cosine similarity takes into consideration one-step neighborhood of nodes iand

jwhile constructing the edge weight and is very eﬃcient to calculate for sparse

matrices. The weights aij are estimated as follows:

aij =PlAilAjl

pPlA2

ilqPlA2

jl

(6)

where Aij represents the adjacency matrix.

We perform an experiment to calculate the correlation between the one-step topo-

logical measure and the cosine similarity measure. For this experiment, we gener-

ated 250 random geometric networks using N= 250 and the connectivity parameter

d=0.15.

Figure 1 shows that the cosine similarity metric is nearly perfectly correlated

(Pearson correlation = 0.952) to the topological overlap measure.

Sensitivity to ✓

In this experiment, we check the sensitivity of the proposed Closed-Form approach

w.r.t. the heuristic ✓. For this experiment, we ﬁrst generated 100 random geometric

(RG) networks. In a RG network nodes are generated by uniformly sampling N

points on [0,1]2. An edge is then drawn between these points if the Euclidean

distance between the points is less than a parameter d. This parameter dcontrols the

density of the RG network where smaller values of dresult in sparse networks while

Mall et al. Page 10 of 23

larger values of dgenerates dense networks. In our case, we conducted experiments

using two di↵erent settings. In the ﬁrst case, we use d=0.15, while in the second

setting, we use d=0.3. For both experiments we ﬁx N= 250. For each value of d

and for each generated RG network A, we permute the ﬁrst 50 rows and columns

of the network to generate network B. Therefore, the ﬁrst 50 nodes in networks A

and Bform the gold-standard.

In order to test the sensitivity of the proposed approach w.r.t. ✓, we estimate

the fraction of permuted nodes correctly identiﬁed by the Closed-Form method for

various values of ✓. We used a grid of ✓values varying from ⇥={1050,...,10300}

in multiplicative steps of 1020.The goal of this experiment is to show that

the fraction of correctly identiﬁed nodes w.r.t. various ✓2⇥remains

nearly constant for smaller values of ✓.Figure 2 shows the result for RG

networks with density parameter d=0.15 and d=0.3. From Figure 2, we observe

that the median fraction of permuted nodes identiﬁed by the proposed approaches

increases slowly before it converges to a nearly constant value as we decrease the

threshold ✓(i.e. increase absolute log of threshold ✓).

From this experiment, we conclude that the fraction of truly di↵erential nodes

identiﬁed by the proposed methods increases as we decrease the threshold ✓before

it starts to converge for smaller values of threshold ✓.

We performed further experiments using di↵erent ✓for various values of Nand

observed that threshold ✓behaves similarly independent of the value of N.Weused

the ✓= 1050 as heuristic cut-o↵for future experiments.

Predictive performance comparison

Experimental Setup: The next simulation study that we carried out was to com-

pare the predictive performance of the proposed approach w.r.t. the dGHD [28]

technique. For this experiment, we generate 100 RG networks with N=1,000.

For the ﬁrst experiment we ﬁx the density parameter d=0.15 and permute ﬁrst

100 nodes in network Ato obtain network B. Thus, these ﬁrst 100 nodes form the

di↵erential sub-network for the paired networks Aand B.

In the second case, we use the density parameter d=0.3 to generate the edges

for network A. We then generate a small RG network with 100 nodes using density

parameter d0=0.5. This small dense sub-network is then used to replace the net-

work formed by ﬁrst 100 nodes in the original network Ato form network B.Thus,

in the second experiment, these 100 nodes form the di↵erential sub-network for the

paired networks Aand B. This kind of mechanism can appear in real-life networks,

for example, in case of cancer the transcription activity of some set of genes might

get enhanced or suppressed in patients resulting in more or fewer edges in a sub-

network of the gene or DNA methylation network. Hence, the networks generated

in the ﬁrst case are much sparser in comparison to the networks generated in the

second case.

Evaluation Metrics: We deﬁne the following terms to be used in our analysis:

•True Positives (TP) - Refers to the nodes that are correctly identiﬁed as part

of a di↵erential network.

•False Positives (FP) - Refers to the nodes that are incorrectly identiﬁed as

part of a di↵erential network.

Mall et al. Page 11 of 23

•False Negatives (FN) - Refers to the nodes that are part of the di↵erential

sub-network but are not identiﬁed correctly as part of the sub-network.

•True Negatives (TN) - Refers to the nodes that are correctly identiﬁed as

nodes which are not part of the di↵erential sub-network A⇤and B⇤.

ROC and PR curve comparisons: We generate two set of plots including

the receiver operating characteristic (ROC) curves and the precision-recall (PR)

curves. To generate the plots as shown in Figure 3, we use the ‘ROCR’ [40] pack-

age in R. It generates relatively smooth curves by automatically using di↵erent

thresholds to estimate the true positive rate i.e. n(TP)

n(TP)+n(FN)and the false positive

rate i.e. n(FP)

n(FP)+n(TN)for ROC plot and precision i.e. n(TP)

n(TP)+n(FP)and recall i.e.

n(TP)

n(TP)+n(FN)for the PR plot. Here we use the true positive rate (TPR) and Re-

call interchangeably. Here n(·) represents the total number of nodes. For generating

the plots we used the adjusted p-value lists as obtained from the Closed-Form and

dGHD approaches without specifying any threshold to generate smooth curves.

The data in Figure 3A and Figure 3C shows that Closed-Form approach achieves

better performance in case of di↵erential sub-networks formed by permuted nodes

and sub-networks with higher density. One of the reasons for relatively poor per-

formance of the dGHD approach is that it has low true positive rate (TPR) and

a high false positive rate (FPR) when the network has more edges. This is also

reﬂected by the relatively low Recall and Precision values for the dGHD algorithm

in Table 2 when d=0.3 and d0=0.5. From Figure 3C, we can observe that the

performance of both the dGHD and Closed-Form algorithm improves w.r.t. ROC

when the di↵erential sub-network is denser than the remaining network. However,

the gap between the PR curves of Closed-Form and dGHD methods increases when

the di↵erential sub-network is denser.

AUC comparison: For all further simulated experiments, we use p-value 0.01 as

cut-o↵in order to determine TP, TN, FP and FN respectively. We also evaluated

the area under the ROC curve (AUC ROC [41]) and area under PR curve (AUC PR

[41]) for 100 runs of Closed-Form and dGHD methods (using p-value 0.01 as cut-o↵)

as shown in Figure 4.

We observe from Figures 4A and 4B that the dGHD method has lower variance

w.r.t. AUC ROC and AUC PR metrics in comparison to Closed-Form approach in

the case of permuted di↵erential sub-network. However, in case of denser di↵erential

sub-network, the Closed-Form approach has much smaller variance in comparison

to dGHD algorithm w.r.t. AUC ROC and AUC PR metrics as depicted in Figure

4C and Figure 4D respectively. This suggests that the performance of Closed-Form

technique is better than dGHD method when di↵erential sub-networks are formed

either using permuted nodes or higher density. In order to test for signiﬁcance we

performed the Student’s t-test under the null that the di↵erence in the mean values

of the two ROC distributions is zero i.e. µAU C ROCAµAU C ROCB= 0. At a

signiﬁcance level of 5%, we obtain p-value of 0.48 in case of permuted sub-network,

thereby accepting the null i.e. the di↵erence between the two distributions is not

signiﬁcant. In the case of paired networks with a denser di↵erential sub-network

(i.e. d0=0.5), we obtain p-value of 3.42 ⇥1014 for the Student’s t-test, thereby

rejecting the null. Similarly for the two PR distributions we obtained p-value of

0.42 in case of permuted sub-network and p-value of 2.64 ⇥1020 for the denser

di↵erential sub-network.

Mall et al. Page 12 of 23

Comparison with Community Detection techniques

The task of identifying di↵erential sub-networks can also be rephrased as one of

ﬁnding heavy sub-networks on a single network (say C) constructed by considering

the absolute di↵erence in the edge weights between the topological graph of network

A and the topological graph of network B i.e. Cij =kaij bijk,8i, j 2V.This

problem can then be construed as one of identifying dense modules in the network

C i.e. from the previous experiments we want to discover a module corresponding

to the set of nodes which have permuted or identify the denser sub-network forming

the di↵erential sub-network as a module.

The task of identifying dense/heavy modules in a network (C) is often referred as

community detection or graph partitioning or graph clustering. There is a plethora

of research associated with the problem of community detection including [42, 43,

44, 45, 46, 47, 48, 49]. Several of these methods such as jActiveModules [50] and

Spinglass algorithm [45] have also been applied to identify biologically meaningful

modules (like functional modules, protein complexes, disease associated genes etc.)

in biological networks as shown in [51, 52]. For our task of identifying dense modules

in network C we applied 3 di↵erent community detection methods namely Louvain

[43], Infomap [44] and Spinglass [45] techniques to have a comprehensive comparison

with the proposed Closed-Form approach. We used the implementation of these

methods available in the ‘igraph’ package in R and run each of these methods at

their default settings.

We used the same set of RG networks as in the previous experiments to have a

comparison with the community detection techniques. Since we are considering the

di↵erence in the topology of networks A and B in network C, we remove all the

similarity between the two networks and the module with the maximum internal

volume (i.e. total weight of edges within the community) is the one capturing the

maximum di↵erence between the topologies of networks A and B. Hence, we consider

the densest inferred module as the one comprising the di↵erential sub-network and

label all the nodes belonging to this cluster as di↵erential while all the other modules

are considered non-di↵erential. Using this notion to label the inferred communities,

we compare the results obtained for the 3 di↵erent community detection techniques

w.r.t. the gold standard (i.e. the actual set of labeled nodes which either belong

to the permuted sub-network or belong to the denser sub-network) in a binary

classiﬁcation framework [53, 54]. These results are integrated in Table 2 along with

the results of dGHD technique and the proposed Closed-Form (CF) approach. We

assess the results obtained from the 3 community detection methods w.r.t. several

quality metrics commonly used for binary classiﬁcation including Precision, Recall,

Kappa, Accuracy, Speciﬁcity, AUC ROC and computational time. From Table 2, we

observe that the Louvain method clearly outperforms the Infomap and Spinglass

techniques in correctly identifying the di↵erential sub-network as a module with

respect to the various evaluation metrics.

Simulated Result Analysis

Finally, the summary Table 2 highlights the computational eﬃciency and better

predictive capabilities of the proposed technique in comparison to dGHD algorithm.

For this comparison, we report the results obtained on 100 random runs of RG

Mall et al. Page 13 of 23

networks with N= 1000, d=0.15 and d=0.3 respectively, where the ﬁrst 100

nodes are permuted. We also report results when the ﬁrst 100 nodes form the denser

di↵erential sub-networks i.e. in experiments where d=0.15 use d0=0.3 to form

denser sub-network and where d=0.3used0=0.5 to form denser sub-network. We

also conducted experiments on undirected Power Law (PL) graphs using N= 1000

and E= 10,000 with power law exponents ↵={2,3}respectively. We permuted

the ﬁrst 100 nodes of each PL network (B) to form the permuted network (A).

We performed 100 random runs and report the mean values for various evaluation

metrics.

Table 2 compares the Closed-Form, Louvain, Infomap, Spin-glass and dGHD tech-

niques w.r.t. various standard evaluation metrics like AUC, Precision, Recall, Ac-

curacy, Speciﬁcity, Kappa statistic and computational Time for all the simulation

experiments. Higher values of these evaluation metrics represents better quality re-

sults. Here the time required by dGHD algorithm is normalized to 1 and the time

required by the other algorithms is scaled by the same normalization factor.

We observe from Table 2 that the Closed-Form approach performs exceedingly

well in case of experiments on denser RG networks (d=0.3) and PL graphs. It

emerges as the best method on these networks for various evaluation metrics. For

this conﬁguration, in case of both permuted and denser di↵erential sub-networks,

the mean AUC ROC of Closed-Form approach is at least 10% higher than the dGHD

algorithm. This is also reﬂected in higher values of Precision (0.714 and 0.771) and

Recall (0.789 and 0.930) metrics for Closed-Form approach in comparison to low

values of Precision (0.645 and 0.7) and Recall (0.577 and 0.731) for the dGHD

algorithm in case of these experiments.

However, in case of sparse networks where its relatively easier to identify dif-

ferential sub-networks ([28]), both Closed-Form and dGHD method have similar

predictive performance. For sparse networks, the Louvain method nearly outper-

forms all other methods for the task of identifying the di↵erential sub-network as a

module. From Table 2, we observe that the 3 community detection techniques have

nearly perfect Recall scores but usually have relatively low Precision values. This

indicates that these methods correctly identify all the nodes forming the di↵erential

sub-network but also detect a large quantity of false-positives in the densest mod-

ule, thereby reducing the Precision values. The Louvain and Infomap methods are

extremely fast and interestingly the Louvain method has highest Precision (0.887)

which is at least 10% higher than dGHD algorithm and 5% higher than Closed-Form

approach while identifying the dense di↵erential sub-network in a sparse network

(d=0.15, d0=0.3) as shown in Table 2.

We observe that among the community detection techniques the Louvain method

is the most eﬃcient and is highly competitive with the dGHD algorithm but cannot

outperform the Closed-Form approach on denser networks and Power Law graphs.

Case study in Glioma

As a case study, we performed the di↵erential sub-networks analysis of two gene reg-

ulatory networks re-constructed from the glioma dataset available on the TCGA.

It is well known that the majority of gliomas are divided into two main macro-

categories according to the mutation of the gene IDH1 [17, 15, 55]. Therefore,

Mall et al. Page 14 of 23

an important biological question, that motivated the development of the reported

methodology, was to identify the sub-networks of of transcription factors (TFs) hav-

ing a di↵erent regulatory program in these two major conditions. We re-constructed

two gene regulatory networks belonging to two di↵erent glioma subtypes: IDH-

mutant and IDH-wild-type as reported in the Methods Section.

In our ﬁnal networks we have 457 TFs and 4,085 targets. We observe that these

networks consist of 13,683 unique connections for IDH-mutant and 14,158 for IDH-

wild-type between TF-TF and TF-target. Using these networks, we construct two

unipartite topological graphs as described in the Methods section for the 457 TFs.

We then perform the proposed di↵erential sub-network analysis to identify the TFs

which are part of di↵erential sub-networks in these topological graphs.

Figure 5 shows the signiﬁcant di↵erential sub-networks and Table 3 reports the

topmost TFs which are part of di↵erential sub-networks as detected by our algo-

rithm. In this table, GHD and µ⇡represent the generalized Hamming Distance and

its asymptotic mean between the subgraphs after removing the speciﬁc transcrip-

tion factor in each row of the table. Supplementary Table S1 instead reports the

results for all the 457 considered transcription factors.

In order to highlight the di↵erence of Closed-Form approach with other standard

network analysis methods, we also assembled a global glioma network using all

the available transcriptional proﬁles using the same method described above and

performed a master regulator analysis [38] with respect to the molecular phenotype

under investigation, i.e. genes di↵erentially expressed between IDH mutant and wild

type. Master regulator analysis is extensively adopted to identify TFs that act as

principal regulators in driving the phenotype from one condition to another.

Interestingly, among the topmost TFs (out of 457) forming the di↵erential sub-

networks, we found several genes known to have a central role in controlling speciﬁc

glioma subtypes as well as novel candidates that deserve further biological vali-

dation. In particular, di↵erential network analysis reveals that the sub-network of

STAT3 is one the most di↵erent between IDH-mutant and IDH-wild-type networks

and a particularly signiﬁcant Master Regulator of this wild-type phenotype. Mem-

bers of our group have previously shown that STAT3, together with C/EBP,is

a key regulator of the mesenchymal di↵erentiation and predicts the poor clinical

outcome of IDH-wild-type gliomas [38]. Another key regulator of the IDH-wild-

type gliomas was recently reported by using an integrative functional copy number

analysis is the set of HOXA genes [17]. Moreover, another key network hub that

the algorithm detects as di↵erent is SOX10 which appears to be an active master

regulator of the IDH-mutant phenotype. We recently reported that the GCIMP-low

subgroup in the IDH-mutant cohort can mediated by loss of CpG methylation and

binding of SOX factors [17]. Furthermore, our algorithm identiﬁes methyl-CpG-

binding domain protein 2 (MBD2) as a di↵erential network hub. In particular,

MBD2 has no links in the IDH-wild-type network whereas it is highly connected in

the IDH-mutant network where it is characterized by the CpG island methylator

phenotype (GCIMP) [56]. Further investigation is needed to claim such a hypothesis

as MBD2 is known also as a mediator of the epigenetic gene regulation and its role

in Glioblastoma is being studied as its over-expression may drive tumor growth by

suppressing the anti-angiogenic activity of key tumor suppressors [57].

Mall et al. Page 15 of 23

The di↵erential network method highlights several other TFs as hubs of di↵erential

sub-networks which are not detected with standard MRA. For example, ETV1

and ETV4 which are over-expressed in gliomas of the Codel subtype carrying the

mutation of the CIC gene [58]. Another di↵erential sub-network hub not detected

by standard MRA is the tumor suppressor RFX1 whch has been identiﬁed as an

important target/regulator of the malignancy of Glioblastoma [59], where as the

cell cycle regulators such as E2F1 and E2F1, which play a role in progression of

IDH-mutant glioma are also detected by the Closed-Form algorithm [60].

An important warning that we want to mention is the presence of potential con-

founding e↵ects due to the adopted dataset obtained by merging the expression

proﬁles from two di↵erent platforms. With the additional diﬃculty that the dis-

tribution between IDH-wild-type tumors and IDH-mutated tumors is unequal be-

tween the two platforms (92% of microarray data are wild-type). We adopted this

integrated dataset in order to build the two IDH networks and the global glioma

network. The main computation in this case is the estimation of the mutual informa-

tion between pairs of gene proﬁles (variables) in a set of observations (patients) and

each individual pair of values is always extracted in the same platform. We used a

robust k-nearest neighbor estimator proposed in [61] available in the PARMIGENE

R package [62]. This estimator is not based on binning of values and is non para-

metric, working on the geometry of the scatterplot of each pair of gene expression

values. Therefore, each observation (sample) can be seen as another evidence of

dependency (or in-dependency) between the variables regardless to the platform.

Although, we found this merged dataset useful for the estimation of dependencies

between genes, its adoption for deriving conclusions in terms of sample groups and

pathway analysis should be made with caution.

As a further independent experiment, we performed the same analysis using the

REMBRANDT dataset with the network di↵erential analysis on the two networks

independently built with ARACNe and the Master Regulator Analysis on the global

network. The Table 4 reports the results for the most di↵erent TF sub-networks

detected by the Closed-Form algorithm on this dataset. Interestingly of the top

nine di↵erential nodes obtained in the TCGA dataset ﬁve (FOXJ3, NFIA, CREB1,

SOX10, KLF13) are also detected as signiﬁcant in the REMBRANDT dataset sug-

gesting that these TFs have a very di↵erent regulatory program in glioma subtypes.

Moreover, di↵erently from the TCGA experiment, we observe a signiﬁcant overlap

between the results of Closed-Form and that of the MRA. In particular 70 of the 75

nodes forming the di↵erential sub-networks are also enriched in the MRA (pvalue

of the Fisher exact test: 3.38109. However, in this case the number of signiﬁcant

master regulators is considerably higher than that obtained in the TCGA case (297

vs. 144).

Conclusion

The comparison of gene expression proﬁles across di↵erent phenotypes is enabling

the discovery of novel biomarkers for prognosis or diagnosis. They hold the key to

identify novel targets for therapeutical intervention. In this paper, we proposed an

improvement to the state-of-the-art for comparing two labeled/unlabeled graphs

that are representative of two conditions (e.g. the macro-categories according to

Mall et al. Page 16 of 23

the mutation of the gene IDH1 in our case study) and identifying statistically sig-

niﬁcant di↵erences in their topology. We used the centralized GHD (cGHD) metric

[28] to calculate the distance between the two labeled networks. We proposed a

Closed-Form approach, an improvement to the dGHD algorithm, to detect local-

ized topological di↵erences between paired networks. The Closed-Form approach

calculates the closed-form contribution of each node in the cGHD metric and ef-

ﬁciently removes nodes with the smaller contributions in the cGHD value. From

our experiments on scale free random geometric networks, we discovered that the

Closed-Form approach was 10-15x faster than dGHD from a computational com-

plexity point of view. For di↵erential sub-network analysis in very sparse paired

graphs, both the Closed-Form and dGHD methods had good predictive perfor-

mance. They reached mean AUC values of ⇡0.935 and ⇡0.926 respectively for 100

random runs of simulation experiments. However, for relatively denser networks,

the Closed-Form approach outperformed dGHD. The proposed method achieved a

mean AUC of ⇡0.877 while the dGHD technique reached a mean AUC of ⇡0.724.

The Closed-Form approach also achieved much higher Precision, Recall and Kappa

values in comparison to the dGHD method for relatively denser networks.

We applied our algorithm to detect the main di↵erences between the networks of

IDH-mutant and IDH-wild-type glioma tumors and show that it correctly selects

sub-networks centered on important key regulators of these two di↵erent subtypes.

The adopted dataset is the result of the merging of two di↵erent proﬁling platforms

and, as reported in the Results section, its use for other purposes should be made

with caution. We also report the results on the same data using standard Master

Regulator Analysis on a global network, and show the overlap between the exper-

iments. Indeed, it is known that MRA tends to have many false positives due to

correlations between TF proﬁles which could eventually attenuated with synergy

and shadow analysis. On the contrary, the Closed-Form algorithm for network dif-

ferencing tends to be more conservative as also suggested by the fact that only the

signiﬁcantly di↵erent sub-networks are detected in both datasets.

List of abbreviations

•AUC: Area Under the Curve

•GHD: Generalized Hamming Distance

•FP: False Positive

•FPR: False Positive Rate

•GSEA: Gene Set Enrichment Analysis

•MAD: Mean Absolute Di↵erence

•MR: Master Regulator

•MRA: Master Regulator Analysis

•QAP: Quadratic Assignment Procedure

•RG: Random Geometric

•ROC: Receiver Operating Characteristic

•TF: Transcription Factor

•TO: Topological Overlap

•TP: True Positive

•TPR: True Positive Rate

Declarations

Ethics and consent to participate

Not applicable

Consent to publish

Not applicable

Mall et al. Page 17 of 23

Competing interests

The authors declare that they have no competing interests.

Funding

This work was funded by Qatar Foundation.

Author’s contributions

RM conceived the methodology, developed the algorithms and drafted the manuscript. LC generated the data on

glioma and helped to draft the manuscript. HB performed the statistical analysis. AI participated in the design of

the study and to the critical analysis of the results. MC conceived of the study, participated in its design and

co-ordination and helped to draft the manuscript.

Acknowledgements

None

Availability of data and materials

The scripts implementing the proposed algorithms are available in R at

https://sites.google.com/site/raghvendramallmlresearcher/codes. The gene expression data used in this paper were

downloaded from the TCGA data portal https://cancergenome.nih.gov/ and form the caintegrator portal

https://caintergator.nci.nih.gov/rembrandt.

Supplementary ﬁles

•File name: Table S1.xlsx

•Title of data: Table S1

•Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA

and Rembrandt datasets.

Author details

1QCRI - Qatar Computing Research Institute, HBKU, Doha, Qatar. 2Department of Science and Technology,

University of Sannio, Benevento, Italy. 3BioGeM, Institute of Genetic Research “Gaetano Salvatore”, Ariano Irpino

(AV), Italy. 4Department of Neurology, Department of Pathology, Institute for Cancer Genetics, Columbia

University Medical Center, New York, USA.

References

1. Jin L, Chen Y, Wang T, Hui P, Vasilakos AV. Understanding user b ehavior in online social networks: a survey.

Communications Magazine, IEEE. 2013 September;51(9):144–150.

2. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B. Measurement and Analysis of Online Social

Networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. IMC ’07. ACM;

2007. p. 29–42.

3. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, et al. Graph Structure in the Web.

Comput Netw. 2000;33(1-6):309–320.

4. Erath A, Lchl M, Axhausen K. Graph-Theoretical Analysis of the Swiss Road and Railway Networks Over Time.

Networks and Spatial Economics. 2009;9(3):379–400.

5. Kesidis G. An Introduction to Communication Network Analysis. Hob oken, NJ: Wiley; 2007.

6. Boginski V, Butenko S, Pardolas PM. Statistical analysis of ﬁnancial networks. Computational Statistics and

Data Analysis. 2005;48(2):431–443.

7. Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovery regulartory and signalling circuits in molecular

interaction networks. Bioinformatics. 2002;18.

8. Keller A, Bakes C, Gerasch A, Kaufmann M, Kohlbacher O, Meese E, et al. A novel algorithm for detecting

di↵erentially regulated paths based on gene enrichment analysis. Bioinfomatics. 2009;25(21):2787–2794.

9. Nacu S, Critchley-Throne R, Lee R, Holmes S. Gene expression network analysis and applications to

immunology. Bioinformatics. 2007;23(7):850–858.

10. Dehmer M, Emmert-Streib F. Analysis of Microarray Data: a network-based appraoch. Weinheim: John Wiley

& Sons; 2008.

11. D’haeseleer P, Liang S, Somogyi R. Genetic network inference: From co-expression clustering to reverse

engineering. Bioinformatics. 2000;16(8):707–726.

12. Wallace TA, Martin DN, Ambs S. Interaction among genes, tumor biology and the environment in cancer health

disparities: examining the evidence on a national and global scale. Carcinogenesis. 2011;32(8):1107–1121.

13. Ahern TP, Horvath-Puho E, Spindler KLG, Sorensen HT, Ording AG, Erichsen R. Colorectal cancer,

comorbidity, and risk of venous thromboembolism: assessment of biological interactions in a Danish nationwide

cohort. British Journal of Cancer. 2016;114(1):96–102.

14. Ceccarelli M, Cerulo L, Santore A. De novo reconstruction of gene regulatory networks from time series data,

an approach based on formal methods. Methods. 2014 Oct;69(3):298–305.

15. Turcan S, Rohle D, Goenka A, Walsh LA, Fang F, Yilmaz E, et al. IDH1 mutation is suﬃcient to establish the

glioma hypermethylator phenotype. Nature. 2012;483(7390):479–483.

16. Network CGAR, et al. Comprehensive, integrative genomic analysis of di↵use lower-grade gliomas. N Engl J

Med. 2015;2015(372):2481–2498.

17. Ceccarelli M, Barthel FP, Malta TM, Sabedot TS, Salama SR, Murray BA, et al. Molecular Proﬁling Reveals

Biologically Discrete Subsets and Pathways of Progression in Di↵use Glioma. Cell. 2016 Feb;164(3):550–563.

18. Brandes U, Eriebach T. Network Analysis: Methodological Foundations. Springer. 2005;3418.

Mall et al. Page 18 of 23

19. Lena PD, Wu G, Martelli P, Casadio R, Nardini MC. An eﬃcient tool for molecular interaction maps overlap.

BMC Bioinforma. 2013;14(1):159.

20. Yang Q, Sze S. Path matching and graph matching in biological networks. Journal of Computational Biology.

2007;14(1):56–67.

21. Ramana MV, Scheinerman ER, Ullman D. Fractional isomorphism of graphs. Discrete Mathematics.

1994;132(1):247–265.

22. Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM. Weisfeiler-Lehman Graph

Kernels. Journal of Machine Learning Research. 2011;12:2539–2561.

23. Hamming RW. The unreasonable e↵ectiveness of mathematics. American Mathematical Monthly.

1980;87(2):81–90.

24. Butts C, Carley KM. Canonical labeling to facilitate graph comparison; 1998.

25. Gill R, Datta S, Datta S. A statistical framework for di↵erential network analysis from microarrya data. BMC:

Bioinformatics. 2010;11(1):95.

26. Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Research.

1967;27(2):209.

27. Hubert LJ. Assignment methods in combinatorial data analysis. Marcel Dekker. 1987;1.

28. Ruan D, Young A, Montana G. Di↵erential analysis of biological networks. BMC Bioinformatics. 2015;16:327.

29. Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. Weighted Gene Co-expression Network

Analysis Strategies Applied to Mouse Weight. Mammilian Genome. 2007;18(6):463–472.

30. Ha MJ, Baladandayuthapani V, Do KA. DINGO: di↵erential network analysis in genomics. Bioinformatics.

2015;31(21):3413–20.

31. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet

Mol Biol. 2005;4(1):1128.

32. Allen JD, Xie Y, Chen M, Girad L, Xao GH. Comparing statistical methods for constructing large scale gene

networks. PLoS ONE. 2012;7(1:e29348).

33. Deshpande R, Vandersluis B, Myers CL. Comparison of proﬁle similarity measures for genetic interaction

networks. PLoS ONE. 2013;8(7:e68664).

34. Benjamini Y, Yekutieli D. The control of false discovery rate in multiple testing under dependency. Annals of

Statistics. 2001;29:1165–1188.

35. Johnson WE, Li C, Rabinovic A. Adjusting batch e↵ects in microarray expression data using empirical Bayes

methods. Biostatistics. 2007;8(1):118–127.

36. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, et al. ARACNE: An Algorithm for

the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context. BMC Bioinformatics.

2006;7(S-1).

37. Sales G, Romualdi C. parmigene - a parallel R package for mutual information estimation and gene network

reconstruction. Bioinformatics [ISMB/ECCB]. 2011;27(13):1876–1877. Available from:

http://dblp.uni-trier.de/db/journals/bioinformatics/bioinformatics27.html#SalesR11.

38. Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Snyder EY, et al. The transcriptional network for

mesenchymal transformation of brain tumours. Nature. 2010;463(7279):318–325.

39. Guan X, Vengoechea J, Zheng S, Sloan AE, Chen Y, Brat DJ, et al. Molecular subtypes of glioblastoma are

relevant to lower grade glioma. PLoS One. 2014;9(3):e91216.

40. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classiﬁer performance in R. Bioinformatics.

2005;21(20):3940–3941.

41. Mankiewicz R. The Story of Mathematics. Princeton, NJ: Princeton University Press; 2004.

42. Girvan M, Newman ME. Community structure in social and biological networks. Proceedings of the national

academy of sciences. 2002;99(12):7821–7826.

43. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal

of statistical mechanics: theory and experiment. 2008;2008(10):P10008.

44. Rosvall M, Bergstrom CT. Multilevel compression of random walks on networks reveals hierarchical

organization in large integrated systems. PloS one. 2011;6(4):e18209.

45. Reichardt J, Bornholdt S. Statistical mechanics of community detection. Physical Review E.

2006;74(1):016110.

46. Orman GK, Labatut V. A comparison of community detection algorithms on artiﬁcial networks. In:

International Conference on Discovery Science. Springer; 2009. p. 242–256.

47. Mall R, Langone R, Suykens JA. Multilevel hierarchical kernel spectral clustering for real-life large scale

complex networks. PloS one. 2014;9(6):e99966.

48. Mall R, Langone R, Suykens JA. Kernel sp ectral clustering for big data networks. Entropy.

2013;15(5):1567–1586.

49. Mall R, Langone R, Suykens JA. Self-tuned kernel spectral clustering for large scale networks. In: Big Data,

2013 IEEE International Conference on. IEEE; 2013. p. 385–393.

50. Dittrich MT, Klau GW, Rosenwald A, Dandekar T, M¨uller T. Identifying functional modules in protein–protein

interaction networks: an integrated exact approach. Bioinformatics. 2008;24(13):i223–i231.

51. West J, Beck S, Wang X, Teschendor↵AE. An integrative network algorithm identiﬁes age-associated

di↵erential methylation interactome hotspots targeting stem-cell di↵erentiation pathways. Scientiﬁc reports.

2013;3:1630.

52. Jiao Y, Widschwendter M, Teschendor↵AE. A systems-level integrative framework for genome-wide DNA

methylation and gene expression data identiﬁes di↵erential gene expression modules under epigenetic control.

Bioinformatics. 2014;30(16):2360–2366.

53. Steinwart I, Hush D, Scovel C. A classiﬁcation framework for anomaly detection. Journal of Machine Learning

Research. 2005;6(Feb):211–232.

54. Kumar A, Niculescu-Mizil A, Kavukcuoglu K, Daume III H. A binary classiﬁcation framework for two-stage

multiple kernel learning. arXiv preprint arXiv:12066428. 2012;.

Mall et al. Page 19 of 23

55. Eckel-Passow JE, Lachance DH, Molinaro AM, Walsh KM, Decker PA, Sicotte H, et al. Glioma groups based

on 1p/19q, IDH, and TERT promoter mutations in tumors. New England Journal of Medicine.

2015;372(26):2499–2508.

56. Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, et al. Identiﬁcation of a CpG

island methylator phenotype that deﬁnes a distinct subgroup of glioma. Cancer cell. 2010;17(5):510–522.

57. Zhu D, Hunter SB, Vertino PM, Van Meir EG. Overexpression of MBD2 in glioblastoma maintains epigenetic

silencing and inhibits the antiangiogenic function of the tumor suppressor gene BAI1. Cancer research.

2011;71(17):5859–5870.

58. Gleize V, Alentorn A, Connen de K´erillis L, Labussi`ere M, Nadaradjane AA, Mundwiller E, et al. CIC

inactivating mutations identify aggressive subset of 1p19q codeleted gliomas. Annals of neurology.

2015;78(3):355–374.

59. Feng C, Zhang Y, Yin J, Li J, Abounader R, Zuo Z. Regulatory factor X1 is a new tumor suppressive

transcription factor that acts via direct downregulation of CD44 in glioblastoma. Neuro-oncology.

2014;16(8):1078–85.

60. Bai H, Harmancı AS, Erson-Omay EZ, Li J, Co¸skun S, Simon M, et al. Integrated genomic characterization of

IDH1-mutant glioma malignant progression. Nature genetics. 2016;48(1):59–66.

61. Kraskov A, St¨ogbauer H, Grassberger P. Estimating mutual information. Physical review E. 2004;69(6):066138.

62. Sales G, Romualdi C. parmigenea parallel R package for mutual information estimation and gene network

reconstruction. Bioinformatics. 2011;27(13):1876–1877.

Mall et al. Page 20 of 23

Figures

Figure 1: Correlation between topological overlap and cosine

similarity on 250 random networks.

Figure 2: Sensitivity Analysis of Parameter ✓.The boxplots represents

the distribution of True Positive Rate (TPR) identiﬁed by Closed-Form

approach for 100 random runs of the experiment.

Figure 3: Comparison of proposed Closed-Form approach with dGHD

algorithm. Figures A and B correspond to the ROC and PR plot for

permuted sub-network (d=0.15) respectively. Figure C and D represents the

ROC and PR plot corresponding to denser sub-network (d=0.3 and d0=0.5)

respectively. Clearly, the Closed-Form technique has better performance than

the dGHD algorithm.

Figure 4: Comparison of proposed Closed-Form approach with

dGHD method w.r.t. AUCROC and AUCPR for 100 random runs

of the experiment. These metrics are calculated using p-value 0.01 as cut-o↵.

Figures A and B correspond to the AUCROC and AUCPR for permuted

sub-network (d=0.15) respectively. Figures C and D represents the AUCROC

and AUCPR corresponding to denser sub-network (d=0.3 and d0=0.5)

respectively.

Figure 5: Di↵erential sub-networks between IDH-mutant and IDH

wild-type detected by the closed form approach. In red the connection

present only in the IDH-mutant sub-network, while in green those present

only in the IDH-wild-type sub-network. In black are represented common

connections.

Mall et al. Page 21 of 23

Tables

Table 1: Time complexity comparison Here Krepresents the number of nodes

for which p-value is greater than ✓and generally K⌧N. An important remark is

that the cGHD calculation after removal of each node can be done independently in

parallel. So, in case we have Tprocessors, the complexity of the proposed approach

will reduce ⇡linearly w.r.t. T.

dGHD Closed-Form

O(N2|E|)O(N|E|+Nlog(N)+K2|E|)

Table 2: Comparison of proposed Closed-Form (CF) approach with dGHD

algorithm We compared the proposed Closed-Form approach with dGHD, Lou-

vain, Infomap and Spinglass techniques w.r.t. various evaluation metrics for random

geometric (RG) and power law (PL) networks. Bold represents the best results.

Parameters Method AUC ROC Precision Recall Accuracy Speciﬁcity Kappa Time

Mean ±Sd Mean ±Sd Mean ±Sd Mean ±Sd Mean ±Sd Mean ±Sd Mean

d=0.15 (RG) CF 0.935 ±0.051 0.849 ±0.037 0.846 ±0.102 0.969 ±0.011 0.983 ±0.004 0.828 ±0.068 0.078

d=0.15 (RG) dGHD 0.926 ±0.018 0.793 ±0.021 0.878 ±0.036 0.965 ±0.005 0.974 ±0.003 0.813 ±0.026 1.0

d=0.15 (RG) Louvain 0.980 ±0.016 0.767 ±0.052 1.0 ±0.0 0.965 ±0.028 0.960 ±0.031 0.841 ±0.1130.012

d=0.15 (RG) Infomap 0.843 ±0.012 0.262 ±0.015 1.0 ±0.0 0.718 ±0.022 0.685 ±0.024 0.304 ±0.024 0.018

d=0.15 (RG) Spinglass 0.832 ±0.011 0.249 ±0.012 1.0 ±0.0 0.699 ±0.018 0.665 ±0.021 0.285 ±0.020 0.85

d=0.15,d0=0.3CF 0.927 ±0.048 0.839 ±0.031 0.862 ±0.098 0.969 ±0.008 0.982 ±0.005 0.825 ±0.054 0.081

d=0.15,d0=0.3dGHD 0.922 ±0.022 0.806 ±0.027 0.868 ±0.045 0.966 ±0.006 0.977 ±0.004 0.816 ±0.032 1.0

d=0.15,d0=0.3Louvain 0.978 ±0.018 0.887 ±0.137 0.974 ±0.042 0.982 ±0.018 0.982 ±0.023 0.916 ±0.083 0.013

d=0.15,d0=0.3Infomap 0.849 ±0.008 0.269 ±0.009 1.0 ±0.0 0.728 ±0.015 0.698 ±0.016 0.316 ±0.016 0.020

d=0.15,d0=0.3Spinglass 0.859 ±0.009 0.284 ±0.013 1.0 ±0.0 0.747 ±0.016 0.719 ±0.017 0.339 ±0.019 0.92

d=0.3(RG) CF 0.877 ±0.067 0.714 ±0.0750.789 ±0.135 0.947 ±0.016 0.975 ±0.011 0.716 ±0.099 0.083

d=0.3(RG) dGHD 0.724 ±0.029 0.645 ±0.049 0.577 ±0.059 0.921 ±0.007 0.971 ±0.006 0.504 ±0.051 1.0

d=0.3(RG) Louvain 0.866 ±0.019 0.406 ±0.061 1.0 ±0.0 0.850 ±0.034 0.833 ±0.038 0.505 ±0.072 0.013

d=0.3(RG) Infomap 0.677 ±0.011 0.147 ±0.004 1.0 ±0.0 0.419 ±0.019 0.354 ±0.022 0.100 ±0.008 0.021

d=0.3(RG) Spinglass 0.678 ±0.011 0.148 ±0.004 1.0 ±0.0 0.420 ±0.018 0.355 ±0.021 0.100 ±0.008 0.90

d=0.3,d0=0.5CF 0.979 ±0.005 0.771 ±0.0610.930 ±0.0820.965 ±0.0120.969 ±0.011 0.821 ±0.062 0.09

d=0.3,d0=0.5dGHD 0.848 ±0.071 0.700 ±0.038 0.731 ±0.148 0.941 ±0.010 0.964 ±0.009 0.672 ±0.078 1.0

d=0.3,d0=0.5Louvain 0.932 ±0.029 0.478 ±0.118 1.0 ±0.0 0.879 ±0.054 0.866 ±0.059 0.582 ±0.128 0.014

d=0.3,d0=0.5Infomap 0.674 ±0.010 0.145 ±0.004 1.0 ±0.0 0.413 ±0.018 0.348 ±0.020 0.097 ±0.008 0.023

d=0.3,d0=0.5Spinglass 0.711 ±0.007 0.162 ±0.003 1.0 ±0.0 0.481 ±0.013 0.423 ±0.014 0.128 ±0.006 0.94

↵=2(PL) CF 0.797 ±0.046 0.307 ±0.3070.792 ±0.099 0.801 ±0.018 0.349 ±0.051 0.802 ±0.022 0.09

↵=2(PL) dGHD 0.797 ±0.013 0.294 ±0.009 0.809 ±0.027 0.787 ±0.008 0.333 ±0.015 0.784 ±0.009 1.0

↵=2(PL) Louvain 0.780 ±0.014 0.212 ±0.010 1.0 ±0.0 0.703 ±0.018 0.272 ±0.016 0.690 ±0.011 0.015

↵=2(PL) Infomap 0.665 ±0.013 0.141 ±0.004 1.0 ±0.0 0.603 ±0.018 0.162 ±0.012 0.484 ±0.019 0.026

↵=2(PL) Spinglass 0.687 ±0.014 0.153 ±0.006 1.0 ±0.0 0.645 ±0.021 0.194 ±0.011 0.527 ±0.016 0.90

↵=3(PL) CF 0.825 ±0.019 0.345 ±0.0150.825 ±0.035 0.826 ±0.007 0.402 ±0.024 0.826 ±0.004 0.085

↵=3(PL) dGHD 0.808 ±0.027 0.327 ±0.018 0.799 ±0.050 0.816 ±0.008 0.375 ±0.031 0.817 ±0.004 1.0

↵=3(PL) Louvain 0.774 ±0.015 0.233 ±0.011 1.0 ±0.0 0.736 ±0.019 0.301 ±0.009 0.732 ±0.019 0.015

↵=3(PL) Infomap 0.670 ±0.014 0.168 ±0.005 1.0 ±0.0 0.635 ±0.017 0.210 ±0.014 0.532 ±0.014 0.027

↵=3(PL) Spinglass 0.694 ±0.013 0.179 ±0.007 1.0 ±0.0 0.670 ±0.023 0.232 ±0.012 0.571 ±0.017 0.94

Mall et al. Page 22 of 23

TF Z-score GHD µMRA fdr

FOXD3 0.000 1.000 1.000 1.000E+00

FOXJ3 0.000 1.000 1.000 8.442E-03

MLX 0.000 1.000 1.000 8.075E-01

NFIA 0.000 1.000 1.000 4.502E-01

ETV1 0.062 0.058 0.058 1.000E+00

E2F1 0.085 0.058 0.058 1.007E-01

CREB1 0.208 0.058 0.058 8.580E-01

SOX10 0.234 0.058 0.058 8.442E-03

KLF13 0.338 1.000 0.278 1.240E-02

STAT3 0.354 0.058 0.058 8.442E-03

RUNX3 0.387 0.058 0.059 1.671E-02

IRF3 0.406 0.840 0.455 8.442E-03

ZNF354C 0.498 0.058 0.057 1.000E+00

HOXD13 0.540 0.059 0.059 2.492E-01

ZIC1 0.622 0.058 0.058 5.787E-02

HOXA2 0.700 0.059 0.059 1.405E-01

FOXO1 0.743 0.058 0.058 8.183E-02

MAFG 0.817 0.862 0.467 6.857E-01

RFX1 0.865 0.059 0.059 3.131E-01

NR1H2 0.871 0.058 0.058 8.176E-01

PAX 6 1. 0 0 3 0 . 0 5 8 0 .057 4. 1 4 7 E - 0 1

GLIS2 1.035 0.058 0.059 8.442E-03

NR4A2 1.118 0.058 0.058 1.000E+00

STAT4 1.137 0.848 0.486 9.615E-01

DLX6 1.208 0.058 0.059 1.000E+00

SIX4 1.232 0.058 0.058 1.000E+00

MEF2D 1.379 0.058 0.059 8.442E-03

MTF1 1.388 0.058 0.057 1.000E+00

MBD2 1.480 0.820 0.495 1.969E-01

OTP 1.493 0.058 0.057 2.970E-01

ETV4 1.529 0.059 0.059 2.122E-01

ZBTB12 1.566 0.194 0.189 4.255E-02

HOXB4 1.595 0.058 0.057 3.019E-01

PLAG1 1.622 0.195 0.190 3.434E-01

E2F6 1.668 0.197 0.192 8.442E-03

CREM 1.674 0.765 0.506 2.122E-01

IRF9 1.700 0.058 0.057 5.950E-02

KLF6 1.709 0.059 0.059 8.442E-03

TFE3 1.716 0.199 0.193 1.049E-01

HSF2 1.759 0.201 0.195 1.671E-02

NR2C1 1.800 0.058 0.058 2.122E-01

ONECUT2 1.804 0.202 0.196 3.657E-02

HOXD3 1.847 0.204 0.198 1.000E+00

BACH1 1.888 0.058 0.059 2.897E-01

GSX1 1.895 0.207 0.200 1.000E+00

HOXA13 1.930 0.058 0.057 1.000E+00

VAX2 1.937 0.208 0.201 1.609E-01

Table 3: The top most di↵erent transcription factors detected between

IDH-mutant and IDH-wildtype in the TCGA dataset. The columns re-

ports the di↵erential measures in terms of Z-score of the proposed di↵erencing test

(equation (2)), the GHD computed between the two networks, the mean of the null

GHD distribution. The last column reports the False Discovery Rate of the GSEA

enrichment obtained with a Master Regulator Analysis.

Mall et al. Page 23 of 23

TF Z-score GHD µMRA fdr

MGA 0.000 1.000 1.000 2.166E-03

TEAD1 0.000 1.000 1.000 8.017E-04

FOS 0.000 1.000 1.000 5.137E-04

JUNB 0.000 1.000 1.000 5.137E-04

MEF2C 0.015 0.015 0.015 8.001E-04

LEF1 0.058 0.014 0.014 5.137E-04

NEUROD2 0.096 0.016 0.016 1.221E-03

EGR2 0.110 0.013 0.013 6.263E-03

JUN 0.123 0.333 0.500 5.137E-04

ARX 0.144 0.012 0.012 9.301E-02

BBX 0.173 0.012 0.012 7.333E-04

TCF3 0.198 0.011 0.011 5.137E-04

LHX6 0.205 0.017 0.017 8.492E-04

EGR1 0.211 0.011 0.011 9.696E-03

BCL6B 0.214 0.011 0.011 5.137E-04

E2F2 0.217 0.011 0.011 7.786E-04

E2F7 0.220 0.012 0.012 5.137E-04

E2F8 0.223 0.012 0.012 5.137E-04

ELF4 0.226 0.012 0.012 5.137E-04

ETV5 0.229 0.013 0.012 5.137E-04

FLI1 0.232 0.013 0.013 5.137E-04

FOXG1 0.235 0.013 0.013 1.000E+00

HOXD9 0.239 0.014 0.014 9.728E-04

ID4 0.242 0.014 0.014 7.786E-04

IRF8 0.246 0.014 0.014 2.393E-02

MYBL2 0.250 0.015 0.015 5.137E-04

NFIA 0.254 0.015 0.015 4.085E-03

NFIB 0.258 0.016 0.016 7.796E-04

KLF13 0.258 0.360 0.515 8.001E-04

OLIG2 0.262 0.016 0.016 7.893E-02

PROX1 0.266 0.017 0.017 1.020E-02

SOX2 0.270 0.017 0.017 2.995E-03

TEF 0.275 0.018 0.018 8.221E-04

ZBTB7A 0.280 0.019 0.018 7.700E-04

ZIC1 0.284 0.019 0.019 7.700E-01

SOX13 0.295 0.021 0.020 8.086E-04

TCF7L2 0.300 0.021 0.021 7.487E-04

BCL6 0.305 0.022 0.022 5.137E-04

MAF 0.317 0.024 0.024 5.137E-04

CEBPB 0.330 0.024 0.024 5.137E-04

CEBPD 0.337 0.025 0.025 5.137E-04

HLF 0.344 0.018 0.018 3.029E-03

ELK1 0.349 0.025 0.025 8.017E-04

FOXJ3 0.369 0.027 0.026 5.137E-04

MTF1 0.377 0.028 0.027 5.137E-04

TP53 0.388 0.028 0.028 5.137E-04

GABPA 0.407 0.030 0.029 5.137E-04

CDC5L 0.417 0.031 0.031 7.899E-04

RORA 0.422 0.329 0.467 7.796E-04

IRF9 0.426 0.031 0.031 3.062E-03

STAT1 0.437 0.033 0.032 5.137E-04

CREB1 0.456 0.035 0.034 5.137E-04

SOX10 0.462 0.036 0.035 8.250E-04

HOXD1 0.475 0.038 0.037 5.137E-04

SOX8 0.479 0.038 0.037 1.760E-03

HOXD11 0.480 0.047 0.046 2.975E-02

NR2F2 0.490 0.042 0.041 5.186E-04

DLX1 0.491 0.046 0.045 7.700E-04

TCF12 0.493 0.040 0.040 9.117E-04

THRB 0.495 0.051 0.050 9.850E-04

DLX2 0.496 0.045 0.044 8.492E-04

HOXD10 0.498 0.050 0.049 5.137E-04

ATF5 0.505 0.057 0.055 5.137E-04

STAT4 0.515 0.055 0.054 9.220E-04

TBR1 0.519 0.020 0.020 9.272E-04

MESP1 0.521 0.092 0.087 8.746E-04

POU3F2 0.523 0.063 0.061 5.137E-04

TFEC 0.530 0.082 0.079 5.137E-04

TCF4 0.533 0.071 0.069 7.487E-04

ETS2 0.543 0.176 0.163 9.728E-04

CREM 0.558 0.110 0.104 5.140E-04

TP63 0.561 0.105 0.099 9.220E-04

STAT6 0.563 0.091 0.087 5.137E-04

NPAS2 0.575 0.136 0.127 1.889E-01

GLI3 0.601 0.313 0.455 4.663E-02

Table 4: The top most di↵erent transcription factors detected between

IDH-mutant and IDH-wildtype in the REMBRANDT dataset. The

columns reports the di↵erential measures in terms of Z-score of the proposed di↵er-

encing test (equation (2)), the GHD computed between the two networks, the mean

of the null GHD distribution. The last column reports the False Discovery Rate of

the GSEA enrichment obtained with a Master Regulator Analysis.

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

0.058 0.060 0.062 0.064 0.066 0.068

0.070 0.075 0.080

Cosine Measure

TO Measure

0.7

0.8

0.9

1.0

116 162 208 254 300 346 392 438 484 530 576 634 691

Absolute log of thresholds θ

True positive rate

Experiment

Dense Permuted Subnet

Sparse Permuted Subnet

A B

C D

ClosedForm dGHD

0.0 0.2 0.4 0.6 0.8 1.0

AUC_ROC

ClosedForm dGHD

0.0 0.2 0.4 0.6 0.8 1.0

AUC_PR

●

●

●

●

ClosedForm dGHD

0.4 0.5 0.6 0.7 0.8 0.9 1.0

AUC_ROC

●

ClosedForm dGHD

0.0 0.2 0.4 0.6 0.8 1.0

AUC_PR

p = 0.48 p = 0.42

p = 2.64 x 10-20

p = 3.42 x 10-14

False positive rate

True positive rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

ClosedForm

dGHD

Recall

Precision

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

ClosedForm

dGHD

False positive rate

True positive rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

ClosedForm

dGHD

A B

C

Recall

Precision

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

ClosedForm

dGHD

D

RAD54L

CDC20

CDC6

RNASEH2A

ORC1L

MND1

TRIP13

UBE2C

ZNF107

PRKRIR

CAPRIN1

NEDD1

DBF4

MAP2

CD55

RAD51AP1

CTCF

SMEK2

ZNF624

BRD4

PHF8

C19orf29

PRR14

THOC7

ZFHX4

CASP2

TYK2

RAVER1

CDC25A

PCNA

RECQL4

POLD1

TUBB

POLA2

C9orf100 C19orf57

CDKN3

EXO1

KIF18A

RFX2

LIX1

SPAST

KCTD10

NFIX

PTBP1

ARNT

HIC2

WDR90

GNG4 ARNT2

MYBL2

ESPL1

CTXN3

EZH2

ORC6L

CLSPN

NCAPG2

MCM10

PBXIP1

PLEK2

OTX2

EBF3

NKX3-2 OSR2

KNTC1

SALL3

ECT2

C5orf32

PPP4R2

RRP1B

VEZF1

RSPH1

ADD3

EXPH5

PSD2

STON2 HEPACAM

HPSE2

NR3C2

DBX2

LIMCH1

SNTA1

GRIK1

MKX

SLC15A2

BBS1

NR2E1

BBOX1

BMPR1B

POLR3H

MED29

SSBP2

TMEM47

REV1 STAG1

SP4

C10orf18

PMS1

AKR1C3

LMNB2

ZBTB39

RBM15B

NFATC3

ADNP

LIX1L

ZBED4

DVL2

FGD1

GEMIN4

NOTCH1

ZNF677

IL17D

TTPA

FBLIM1

FGFR3

NTSR2

RAVER2

GJA1

ABCD2

RXRG

ZIC5

TJP2

BCL7A

SS18L1

FGF1

ARVCF

CDT1

TOX3

ITGB4

CAPN1

PRAF2

EMD

AHR TNFRSF1A

LRP10

MAP3K5

SORBS1

CYP27B1

RAB34

KIAA0182

PRDX6

REEP1

HCN3

RNF165

HOXA1

PSME3

C15orf42

HOXA3

BUB1

DEPDC1B CYorf15A

HOTAIR

CKAP2

CCNB2

SPC25

BIRC5

TEX10

KIFC1

MKRN3

LMNB1

C18orf54

CDCA8

ADD2

RGL3

RGMB

SHD

SQSTM1

PICALM

SP100

TPP1

NFE2L1

DSN1

RACGAP1

LOC81691

TK1

PKN3

WDR62

FANCI

ZWINT ACADSB

KIAA0101

MCM6

CCNB1

MCM5

WHSC1

CHTF18

NFKBIL2

CHAF1A

GINS1

CDCA5

OXR1

MCM4

KIF22

KIF14

POLE

GINS4

C16orf75VRK1

CCNF

MCM2

TRAIP

C5orf34

MASTL

NCAPH

PRC1

CENPH

DHFR

FANCA

TYMS

CKS1B

C1orf135

STIL

WDR76

TROAP

C21orf58

DTL

FOXD2

C12orf48

DDX39 CCNE1

FEN1 CENPK

HMGB3

CENPN

GINS2

PBK

RFC3

CENPO

CENPF

PKMYT1

MCM7

ZNF367

C16orf59

ESCO2

CDC7

CDCA2

STAT6

C6orf15

RNF122

KIF15

SP3

MEX3A

RIMS3

FOSL2

NFKB1

MSN

DLX2

VEPH1

IL1RAP

VIPR2

OLIG2

BCOR

OLIG1

SOX4

CDC25C

CKAP2L

FANCE

TRIB2

TRIM24

TCF12

MKI67

CSPG5

TPX2

KLF12

TRIM9

UBE2T

NOVA1

MEOX2

HOXD11

FAM131B

WSCD1

HDX

ARSJ

DCX

TMEM169

MEX3B

ZNF300

DHX40

PPARA

IBSP

SBK1

MAP1LC3C

NDST4

POLR3B

DLX1

CRABP1

BCL11A

ASXL3 CAMKV

CCK

SLC32A1

FGF13

EPHB1

GABRA1 PRMT8

TAC3

SOX11

IRX5

HOXC10 STIM1

PAX3

HOXD9

TCF4

SGOL1

ZNF286A

TOP2A

KIAA1549

HOXD4

BNIP2

CEND1

FNDC3B

MAPRE3

NRIP1

AMN1

PI4K2B SEC24D

ZNF678

CKB TCF3

CREB3L2

WWTR1

MED1

MOBKL1B

KLF9

CHD4

TP53

KLF15

GTF3C2

CHD7

AMOT

MYO5C

MGAT4C

C5orf33

AGXT2L1

SLC7A11

RYR1

USP19

ARID2

HEATR1

WRN

KHSRP

TSPYL3

ATP5H

BRD3

ZNF426

ZNF91

ADCY8GLI2

MASP1

AHCYL1

HEATR6

PPP2R5A

MAML2 TGFBR1

BAT2

IFNGR2

ACOT8

CD14

CD276

ELK3

FDX1L

ARID3A SIRT5

NEU3

PYGO1 ETV3

TRRAP

MFSD9

CEP68

KLHL11

UBR2

PFN2

FAM120C

C6orf134

ZFP36L2 RNGTT

CACNG4 FCHSD2

TFCP2

SPIRE2

SOX6

PATZ1

HDAC2

ZNF703

MAZ

RCOR2

ACCN4

C1orf106

NME1

CD109

NPPA

HAND2

MAPKAP1

SLC39A1

FLII

ADAM9

IFI16

WDR1

NFKB2

RELB

RUNX1

FLNA

GLIS3

SOAT1

SEPN1

TLN1

CACYBP

EIF4EBP1

KCNMB2

FGD4

HEY1

ATP6V1B1

ETV6

NUP35

C5AR1

STX11

DAB2 BMF

LAIR1

TNFRSF1B

DMPK

TNFAIP8

HLA-DPA1

STXBP3

ARPC5

TMEM101

CYB5R4

ARPC4

COMMD5

TMEM59

KNDC1

VPS13D

LYZ

NKG7

SIGLEC7 MIF

MFSD1

ABI3

ALOX5

S100A4

MAN2B1

TMEM173

VAMP8

CMTM4

CD300LF

MS4A6A

RAPGEF2

MSR1

SNRPA

GLRX3

SFT2D2

GCNT2

PODXL2

MRPS23

SCP2

GRPEL2

TRPC3

UBIAD1

KCNK3

SLC35E3

ADPRHL2

ZNF507

PSMC4

HP1BP3

STX12

CAMK4

CNN3 AFF4

NOTCH2

TTC4

ZMPSTE24

EPB41

CHGB

LYPLA2

KCNC1

DOCK7

SH3GLB1

SLC30A7

WDR78

TMED5

HNRNPU

RSBN1

OSBP2

LRRC40

TRIP4

RHOC

NOC2L

SF3A3

GNAI3

TAF12

USP48

FAM40A

EIF2C3

SLC38A1

WDR8

ZNF146

C19orf2

YIPF5

BTF3L4

FOXN3

INVS

RAB31HOXA11

SNX7

ACADMHNRNPR

FBL PRPF38B

LAYN

HOXC13

NOG

CDC42

SLC25A33

CCDC88A

ATAD3A

STK38

GPR125

C1orf109

PPT1

PLCE1 RAB11FIP2

ZHX2

TNNT1

ABTB2

MADCAM1

TRIM23

PPP1R14C

VCAM1

GRAMD1A

GRAMD1B

FARP1

PDIK1L

PTPRZ1

NCF1

RASSF2

SERBP1

CMTM7

ARG2

SASH1

ARHGEF9

NRBF2

AP2A1

HMGN2

SCMH1

GDAP1

PGD

AKR7A2

LRRTM1

LYL1

EBNA1BP2

PGAP1

SLC2A5

TJP1

SPRN

PIH1D1

TMEM147

KIF26B CHRNB2

PARK7

MLL

NIPBL

VIM

NCAN

MED25

RABAC1

RIF1 ATP5F1

psiTPTE22

MYST3

L1CAM

PSD

NSD1 NCOA6

ZNF281

SRRM2

HCST

FAM134A

C12orf43

PTGER4

PTOV1

ATCAY

CACHD1

PRPF31

SHC3

KIAA1409

PPP5C

ABCB8 NCAM1

RGAG4

SYT14

ECD

BDP1

PTBP2

ETHE1

TP53I13

BCL9

B9D2

PNKPPSENEN

RIPK2

CREBBP

CDC42SE1

RAD54L2

SOX2

ARHGEF10L

RPS8

DCLK2

TTYH1

KHDRBS1

DDAH1 USP1

SRM

ZNF335

CHST9

ATXN2

SLC41A1

SEZ6L2

GBX2

SCAMP1

TYROBP

KYNU

NPAS3

RIT1

KCNIP2

ZNF181

ZNF302

TTC9C

ATP5J2

ARHGDIB

MRPL33

MPG

TSPO

PRDX3

ARL11

RAB32

USP47 COMTD1

TSPAN17

ORMDL2

LST1

SPTAN1

C17orf37

MRPL13

MYCBP2

TOM1L2

COMMD1

BNIP1

METTL1

SSR4

GOLT1B

SRA1

STOX2

CAMK2G

C11orf59

ARHGEF12

RNF181

CD300A

ABLIM1

CTNND2

SNX11VKORC1

C10orf125

C1orf83

PRDX4

CWF19L1

DYNLT1

GNGT2

ZNF511

UROS

BAK1SCD

MED28 AP2S1

SERF2ARHGAP5

LY96

LY86

ATP6V0E1

TRIP10

BCAT2

HMGCL

HTATSF1

SEZ6

SIX5

AKR1A1

SNAP91

MYO10

STK3

TPT1

HDAC1

ZNF32

UBE2J2

SEP15

WHSC1L1

CDH13

MARCH4

TRAPPC3

FNBP1L

FXC1

LSM10

PPIH

APC

GTF2B

CCDC23

PDE8A

HECTD3

TGOLN2 HBXIP

MED8

MACF1

CCDC97

UROD

GPATCH3

LSM14A

MRPS15 CEP290

NRD1

CPT2

MYCBP

SCML1

FCGRT

C1QB

AFF1

NARS2

PSMB3

STX8

E2F3

SAMD4B

RUVBL2

C1orf91

NOSIP

LRRC8D

RWDD3

AMZ1

CEPT1

PLCL2

NFKBIBU2AF2

DGKI

CTSH

UTP11L

AURKAIP1

NAT14

ACACA

MEIS3

RPAP2

SLC8A3

MAGOH

NRXN2

CITED4

REEP4

FLT3LG

MRPL20

RIMS2

DBP

ABCA2

KCNH7

ARHGEF1

GPRIN1

RAD23A

GABRB3

BCL2L12

TMEM50A

XKR4

CD3EAP

OMA1

CARD8

CECR6

DEF6

CSNK1G3

ZNF766

UNC13A

SEZ6L

SCAF1

PPIE

PEPD

TOMM40

GNAL

ZNF691

AK2

RCN3

SIX1 GPBP1L1

CCDC130

SERTAD3

PRKD2 NAPA

CAMK2N2

CC2D1A

CEP70

AKT1S1

SHROOM2

MMP24

PDCD5

ZNF580

NLGN3

CCDC123

GRB10

FBXL7

C1orf174

PTCH2

SOX9

SMC1A

ZNF720

TM2D1 RPE65

DNTTIP2

ALMS1

CAPNS1

TMEM181

MAPK8IP2

CELSR3

GNG5

ADCK4

SYF2

PEF1

MRPL37

GABRA3

NUCB1

RBM42

SHKBP1

ECH1

ATRX

SFRS4

NMNAT1

C19orf61

HCFC1

PEX14

SSU72

CACNG2

KDELR1

ANK1

TRIM3

OSBPL9

RER1

CAPZA1

TGFBI

CAPZB

WASF2

HCN4

TXNDC12

ZDHHC22

GRIN3A

DMAP1

PRDM2

CCDC106

DNALI1

MED13L

PSMA5 SLC9A7

ZBTB48

DDX58

RSAD2

HERC6BATF2

IFIT3

B2M OASL

PLSCR1

TRIM21

PSME2

PARP9

TM9SF1

PARP12

TAP1

IRF7

MX1

IFI6

IFI35

DTX3L

HERC5

BTN3A3

IFITM1

IFIT2

ISG15

SP110

EPSTI1

OAS1

PSMB9

STAT1

RNF31

LGALS3BP

EGR3

FOSB

ZIC1

NR4A1

JUNB

ZIC4

NR4A3

APOL6

BST2 IFIT1

IL18BP

USP18

CPLX1

NRGN

CCDC64

HOOK1

STS

SYT12

ADAT3

GALNT5

FBXO41 TIMM50

UPF2

MRPL12 RCC1

TBKBP1MYOD1

SEMA6B

SYN1

MAD2L2

CACNA2D3

AURKA

ZSCAN16

HN1

VSNL1

EMX1

CTSC

ERC2

PMF1

LRFN1

POU3F1

NEFM

MAL2

KCNS1

RAB37

ADCYAP1

ACSL4

TCERG1L

MAP1LC3A

CNTN4

CRYM

CARTPT

OCIAD2

KCNAB1VIP

SH3BGRL2

AGL

ACOT4

GLT1D1

NUS1

BTBD10

CABP1

C10orf140

ANKRD24

ARHGAP15

KCNV1

PART1

RHEBL1

ADAMTS7

GJC1

ZNF784

SF3A2

MEX3D

MCAT

SIRT6

SCARF2

MOSPD3

GATA4

ATP2B3

KCTD8

HPRT1

EID3

BET1L

KCNC2

LZTS1

C2orf39

SYN2

TSPYL2

DLX5

MDH1

RUNDC3A

ZNF410

VTA1

ZNF184TBP

COQ3

ZNF263

RBM11

RAB40C

ITPKA

PVALB

STEAP2

PIP5K1C

N4BP3 ITFG1

GLS2

MEPE

BSCL2

RNF216

SLC6A7

TPD52L1

C9orf91

ELMOD1

STXBP4

THY1

MTRF1L

C6orf170

INTS7

LRCH2 TCP1

KPNA5

TRMT11

UST

RALYL RGS14

BBS7

CPNE8

CXCL14

CD99L2

KIAA0748

SLITRK4

NLGN2

CCNC

TBPL1

PRIC285

ZNF775

TTK

TSPYL4

NFYA

PCGF2

A2BP1

HCN1

NAGPA

PLEKHA1

FAM81A

TMEM155

PPP3CA

CHM

KIAA0284

FAM65A

POP5

SCN8A

NUDT1

SNPH

ABCA5

DUSP8

PCLO

PI4KA

ZCCHC12

PREPL

SYT5

CPEB3

CBLN2

BTBD9

SERINC3

CHRM3

MKL2

EPHB6

XK

PRRT1

CREG2

C6orf27

CALM3

MFSD4

SH2D5

PAK6

PCDHGC5

SPRYD3

DOCK3

NCDN

KCNS2

AAK1

PPP1R3F

RGS7BP

KIAA0513

MCTP1

THRB

KIF3C NPM2

ST8SIA3 FBXO34

EIF2B1

RYR2

NRIP3

H3F3A

KCNAB2

RFXANK

BAI2

NBEA

DUSP2

SV2B

PGM2L1

EPB49

CHD5

KIAA1107 SEC61B

CLSTN1

DDN

DLG4

ZBTB7A

ITPR1

NAPB

SLC30A3

GTF3C6

DLGAP2

STXBP1

C19orf25

MRPS14

CNTNAP1 NDUFB7

PLXNA1

TTC3

C2orf7

GOLGA2

NDUFA9

YEATS4

FBXO4 PMS2L5

SH3BP5

NFAT5

SF1

ZNF627

ZNF684

CCDC6

RASAL2

IGF2R

CDV3

C6orf125

PCNXL2

MNT

NSL1

SLC27A5 CHCHD5 GOLGA3

MAP1B

WDR37

BICD2

APOM

ABL2

PLOD3

BANF1

DIP2C

SBNO1

PLEKHM1

SAR1B

SMARCA4

PPP4C

PPRC1

ACYP2

UBN1

TIMM9

DCI

RAB35

C19orf42

RAB11FIP5

GADD45GIP1

GFOD1

CBX7

NUP54

PIGF

SFT2D3

MAP1A

UTP18

METTL2A

WDR12

EDEM2

VPS29C1orf43

GSTO1

CNIH4

POLR2K

USP22

ETV5

EMG1

ALKBH2

IFI44L

BOLA3

PDPK1

DLEU1

ZFAND6

MRPL15

C9orf45

RNF14

MKI67IP

SRGN

GIMAP5

C12orf41

CAMK1G

PACSIN1

RAB20

DGUOK

HRH2

PTPRRIDS

CNN2

PRRT2

GRIN2A

SLC4A10

CHIC1

CDKL5

DLGAP3

DNAJC5 G3BP2 KALRN

LMO7

CAP2

NPTX1

BRMS1

KIAA1045

PRKCE

CACNB1

CLCN4

MAP3K9

RAB3A

ZC3HAV1

NLRC5

SLC12A5

CMIP

NDEL1

CAMTA2

KCNJ4

FRMPD4 ATP1A3

ATP2B1

CCDC92 CLSTN3

SYNGR1

CACNA1B

CCKBR

ADAM11 PPP3CB

SYT7

GSK3A DVL3

DNAJC16

NUMBL

SFRS7

FDXR

GPR173

ZNF514

PAK4 SPHK2

SEC63

USP34

DVL1

HKDC1

MRPL48

MRTO4

ITPKC

CMBL

C19orf12

SFRS11

FBXO44

PUM1

ZNF594 RAB3GAP2

NCOA1 EDA2R

ZMYM4

NOL11

HOXA10

IFT20

GABPA

HOXA9

CREB3L4

POMGNT1

MYOZ1

B4GALT2

SEPT8

GALE

LRRC47

RPS19

TANC2

PGBD5

DCLRE1B

ZNF776

FBXW2

RPL22L1 NBR1

OR4N2

KIAA0090

RGS7

HS2ST1

C1orf128

YBX1

ZNF436

C1orf27

RNASEN

FAF1

CYB561D1

MSX2

KIAA0430

WDR3 DYRK1B

TGFA

ICMT

SLMO2

MIB2

STOX1

TMEM38B

SLAIN1

CDC14A

RPL11

C1orf212

LIFR

RAB3C

UBE4B

KREMEN2

GNL2

GNB1

DDX20

MAGI3

HS3ST1

DNAJC11

RNF19B

TRIM33

ZNF606

ETS2

PRPF38A

DRG2

UTY

HMP19

PTP4A2

DDOST

NPHP4

YTHDF2

RNF11

IGFBP6

ZNF407

C22orf36

PHTF1

LRRC42

ZNF438

PTPRN

RCC2

HPD

EID2

YIF1B

DHDDS

MFN2

ST3GAL3

PHC2

PHF13

SNAPC2

ZCCHC11

ZDHHC18

CTTNBP2NL

TMEM69

SCN7A

RPL18

NADK

PNRC2

KIAA0319L

BSDC1

ZBTB17

NECAP2

TXLNA

NRAS

KIAA2013

PHACTR4

TAF13

ZZZ3

FAM54B

SNIP1

MTF2

EXOSC10

PKN2

ABCC8

DBT

LEFTY2

ZNF644

PABPC4

RLF PSMB2

SYT4

PRKAR2B

CAP1

EXTL2

EVI5

KPNA6

NIT2

SLC23A2

KLHL21

ELOVL4

PPP1R8

PLEKHM2

STK40

RBBP4

SDF4

TM9SF3

TUB

RPS9

PGLS

C12orf35

RIMS4

ZNF35

RAG1AP1

CBL

KCNJ11

LHX5

THAP3 SLC35D1

FKRP

GPATCH1

OTUD7B

RRM2B

PTPN11

NAV2

KCTD1

RPS5

SUCLG1

SMG5

PTPRF

MTHFR

ATAD1

RERE

LZIC

SLC4A2

EIF4G3

POGZ

NUDT2

PSMD8

YARS

INPP5B

AQP6 SARS

TRIT1

ZCCHC17

AKAP13

ATP6V0A1

VDAC3

CDK5R2

WDTC1

C19orf55

RTCD1

EP400

BMP7

LRRC41 CLCC1

TCEB3

KIAA0562

ZNF629

PDE6D

RRAGC

CAMK2A

NSUN4

ZYG11B

ABHD10

POP4

TNRC6B TNRC6A

BSN

SPATA13

KIAA0467

TXNDC17

C17orf48

FAM96A

MAPKBP1

GRLF1

MICAL3

UHMK1

NDUFB5

SCNM1

NDUFA12

PSMA2

SPTBN1

SOCS7

SBF1

TMEM128

MRPL16

C1orf53

CELSR2

HIPK2

C11orf73

NFU1

C7orf44

ADK

PPP1R12B

TRIAP1

UQCRQ

PPA2

SFMBT2

FAU

C7orf30

TIA1

C1orf31

C1orf50

POLR2G

RAPGEF1

SUPT4H1

MAN1A2

SPEN

ZKSCAN1

MYH3

HERC2

SNRPG

KIF1A

APBA1

C7orf55

MRPL54

COMMD9

PSMD14

MRPL39

RNF111

SLC30A4

NDUFS3

TACC1

RPL27

SNRPD2

ANKRD11

TMEM85

TCEAL6

TBC1D9B

UBE2Q2

KIAA0355

UBE2N

SLC16A5

CD2

SETX

H6PD

C1orf123

PCDH1

C6orf129

MRPL52

OTUD3

C12orf51

TBC1D7

TMEM126B

CARD11

PAK3

ERICH1

WDR61

RPS14

TGFB1

KIAA1109

GOSR1

GPR132

HADH

MRPL44

IL2RB

KIAA0494

BCL7C

ANKRD39

GOLGA4

GPX7

TBL1X

DICER1

CSF1

GDAP2

PSMB8

NDUFS4

C8orf76

CDC42BPA

PFDN5

UBL5

WDFY3

COMMD7

CAMSAP1

MRPL11

C14orf179

C14orf142

ADRBK2

C7orf36

SEPX1

LMTK2

RANBP2

NDUFA13

HUWE1

C17orf42

FIS1

MEMO1

ZBTB43

DUSP18

ASH1L

CIAPIN1

C1orf57

EIF2C1

BBX

NUCKS1

FPGT

PACS2

RPS15A

MRPS7

GOLIM4

MED31

EXOSC4

COMMD10

MRPL46

LARP1

SDF2

SYNJ1

ICK

PGCP

C14orf147

BCORL1

C5orf28

KIF1B

TIMM8B

UBE2B

C21orf88

ZNFX1

DUSP11

CDK8

DRG1

MALAT1

AR

DHX38

BZW1

EXOSC8

GBF1

UBE2O

ATP6V0B RFC1

TBC1D10C

CNOT8

CD48

ESF1

CABIN1

MED27 ASXL2

PPFIA2

DNAJC8TPR

GTF3A

ANKRD17

DPH5

EIF3I

MAP7D1

WDR77

CPSF3L

SEC22B

HMG20B

ZBTB7B

SRRM1

C1orf144

ABCD3

ARID1A

USP33

DR1FBXO42

SCG2

THRAP3

GFRA1

SAFB2

MIER1 S100PBP

HIPK1

CSDE1

ZNF574

FKBP14

EFS

ARL10

WNK3

P2RX7

RNASEH1

SLC30A6

TNR

IL1RAPL1

ZNF318

ZNF41

C20orf94

PDPR ZNF454

CMTM1

ZC3H7B

ZNF778

FUT11

MYST4

H2AFJ

NPAT

C20orf117

USP49

RBL1

DNAJC18

KIRREL

LCOR

RSPRY1

GMCL1

ANAPC1

PHF2

ZNF462

CXXC4

GON4L

C1orf151

NF1

QSER1

BRWD3

ZBTB37

CCDC9LLGL2

MADD

BCKDHA

LHX2

FHL3

KLF4

TBC1D17

ZNF787

SDHB

ZNF430

SLC12A6

NDUFA1

MAPK8ZNF689

MLL5

ADAM10

TSSK4

ZNF486

MAML1

RSF1

ZNF192

ZNF510

KIF2A

ZNF619

CHD8

SERINC5

ZDHHC20

MYEOV2

KPNB1

MARCH8

C17orf79

NLGN1

ZBED3

C1orf58

ZMIZ1

GABRQ

ZNF791

SHPRH

C11orf30

ZSCAN29

JOSD2

DYNC1I1

BRPF3 MORN1

OSBPL8 DEDD2

RAI1

GSK3B

ARL5B

ADAM22

MPHOSPH9

USMG5

C16orf61

SMCR7L

SLCO5A1

EPC1

KLHL20

MON1B

ZNF221

ALG10

SMURF2

PANK3

GCN1L1

SEC24A

ZBTB10

IPMK

UTP14C

LRCH3

BAZ1B

FAM123B

LATS1

RBM41

CCDC28A

NUP155

POLR1B

ZNF660

C9orf102

KIAA0947

LMBRD2

KBTBD7

HIP1

PFAS

LDOC1L

MED13

CCNT1

SON

RAPGEF6

ZNF516 FLOT1

RC3H2

RBM8A

ZFP91ZNF79

CCDC72

ERN1

ZNF121

RAB39

CECR2

CRAMP1L

ZNF142 MBD5

RASA2

CSNK2A1POU3F3

DHX8

GRID2

LRP1

SAMD8

SETD1A

C14orf118

TTC1

FER

CSNK2A1P

PCDH17

SOS1

ZBTB26

ATF7IP

C14orf156

N4BP2

ZNF704

PHF6

ZNF573

EGFR KLHL23

TSR1

ZNF391

FBXW8

ATF6

ZNF623

GTF3C4

KIF13A

PRKDC

TUBGCP4

DHX33

ZNF687

HMGCR

OXER1

RFNG

ZNF446

PCDHGC4

HECW1

VASH2

SLC7A14

PDZD8

FAM123C

MAF

PAF1

TGFBRAP1

ATP2A2

CHD6

VRK3

GABRG2

HLA-DMB

GNAO1

HRH3

HLA-DRA

PROS1

KIAA1543

RAP1B

SLC8A2

BAI3

SLC1A2

SQRDL

CASP8

CSMD1

VSTM2A

SHANK2

RHOQ

BRSK2

ATRN

AKAP11

BZRAP1

IQSEC3

SRGAP3

IGFBP7

DNAJC4

C1QA

SPTB

PTPRT EPHA10

ATP8A2

TMEM149

C1QC

FCER1G

CASKIN1

KCNT1 ZSCAN18

SGEF

ALK

ESRRG

RGS6

SPRY2

SPRY4 GNG7

SLC4A4

SPRED2

GPR3

RAC2

FYB

TLR8 HLA-DOA

CYBB DOK2

TRAF3IP3

GBP5

LHFPL2

FOS

MYADM

GADD45B

ATF3

ZFP36

MMACHC

ADM

MAFF

MCL1

SLC2A3

LIF

HMOX1

PRDM1 SOCS3

SERPINE1

NFIL3

FAM26F

ADPGK

ARHGAP30

FAM78A NOD2

FASTKD3

LCP1 ZNRD1

NDUFA2

IL2RG

ELF4

MPEG1

OSM

CLEC2B

CSTA CTSS

RAB11FIP1

COL4A5

SLC44A1

HHATL

PIP4K2A

MAG

CA10

PDIA2

DBNDD2

EDIL3

POPDC3

BAMBISLC1A1

ATP6V0A4 C11orf9

AFAP1L2

MBP

UGT8

PLEKHH1

FGFR2HHIP TMEM125

DNAH17

SHROOM4

CLDN11

ATP10B ARHGAP22

SLC31A2

SH3GL3 DNM3

RNF144A

ST18

AMOTL2

KCNK12

RAB33A

TMCC2

BCAS1

TMEFF2WASF1

KIF13B

PVRL1

PHACTR3

RAB40B

SLC45A3

AATK

GJB1

PDE1C

RNF125

KCNJ2TMC6

NINJ2

MOG

ERBB3

ENPP2

LIMS2

SGCD

PADI2

ARNTL

SNX22

TBC1D12

RAP2A

DSCAML1

GALNT13

LGI3

CLMN

TMEM144

CC2D1B

HAVCR2

GBP2 RASGRP4

C3orf54

RHOG

ADAM28

AMPD3

SYNGR2

PAIP2B

PEX5LTUBB4

C12orf34

FAM38B

FAAH

FRMD4B

OSTF1

GPR17

PLD1

SLC5A11

RFFL

BOK

PRAM1

KIF21B

ADAMTS4

CNTNAP4

SEC14L5

GPR37

FA2H DPYSL5

WWP2

RASGRF1

LIPE

GPR62

GREM1

MYOT

C9orf125 LDB3

CLCA4

PLLP

LGR5 PPP1R16B

SORT1

CNTN2

ANKS1B

GNAI1

SIRT2

PLP1

DOCK5

ADCY5

NR0B1

CDH19

NACAD

FCRLA

MOBP

RASGEF1C

MAP7

CNP ZEB2

TF

TMC7

KLK6 APOD

GAL3ST1

FAM89A

ZNF488

NKD1

FOLH1

RPL39

TMSB10

IL10RB

MAN2C1

ATPAF2

PPP1CA

POLR1C

VAV3

CENPT

ODZ4

DUSP4

AIFM3

MRPL49

C2orf28

LSM1

VPS25

NEU1 CD300C

HS6ST3

NUP62

EPB41L1

IMMP1L

HERC1

TDRKH

METTL3

NSUN6

CDK9

TIAL1

CCNL1

TSEN54

LY6G5B

KLRA1

ANKS3

ATAD3B

CDK3

C16orf79

AGER

CRIP3

NAT9

CCDC45

DMTF1

RBM39

AFG3L1

ANKZF1

LAT

RBM6

SFI1

PILRB

HEXDC

LRDD

C1orf104

MTERFD3

ZNF334

ARIH2 DDX26B

MAMDC4

FNBP4

SCNN1D

C17orf56

ORAOV1

SEC31B

ITIH4

DNASE1L2

ORMDL1

SFRS6

METT11D1

LUC7L

SPPL2B

DFNB59 PABPN1

ZNF337

CPT1B SFRS8

ZNF682CREBZF

AHSA2

NPFF

CCDC84

RBM5SFRS18

MDM4

PDXDC2

C11orf61

THOC1

UPF3A

AMY2B

ATG16L2

CSAD

FAM64A

RHOT2

ZNF83

CLK1

TARBP1

SFRS2

CDK5RAP3

TGM1

WSB1

ANKRD36

QTRT1

TRPV1

GOLGA8A

CLK4

PPWD1 C8ORFK29

CCDC76

ZNF692

ZNF767

PRPF39

WDR27

TUBE1

ZNF789

DOK3

PTPN7

C3AR1

DENND1C

CCR5 GMIP

MFNG

LTB

GPR65

FXYD5

IL10RA

LRRC25

VAMP2

HOXB7

CLASP2

ELAVL3

GPR82

HOXB3

TPPP HLA-DMA

ATG4C

BIN1

RPS6KA1 AGTPBP1

CEBPA

MLXIPL

SELPLG

PHACTR1

NAP1L2

ACTL6B

CD4

CASP4

ACVR1B LPHN1

HLA-DPB1

SH3BGRL3

VAMP3

RAB13

NRXN1

WBP2

UGDH

PPM1E

CSDA

CACNA2D2

S100A11

HSPA6

CLIC1

RIPK3

NPTXR

C9orf4

SHANK1

SLC6A1

GIMAP8

DENND2D

ITGA4

IFI30

LILRB3

RFX1

CREB1

HOXA2

GSX1

OTP

E2F1

HOXD13

HOXD3

ETV1

TFE3 ONECUT2

FOXO1

ZBTB12

PAX6

BACH1

STAT3

VAX2

DLX6

HOXB4

RUNX3

NFIA

NR1H2

MLX

IRF3

NR4A2

IRF9

GLIS2

STAT4

HSF2

MEF2D

MAFG

E2F6

HOXA13

FOXJ3

FOXD3

CREM

KLF13

MTF1

ZNF354C

MBD2

ETV4

KLF6

SOX10

NR2C1

SIX4

PLAG1