About
153
Publications
25,753
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,714
Citations
Introduction
I work on various applications that involve graphs. Recently I have been working on modeling large scale graphs such as those of social networks, cyber networks. I have been also looking at power systems with focus on vulnerability analysis, and resilient network design. My early work was on traditional applications of combinatorial scientific computing such as sparse matrix computations and parallel computing
Current institution
Additional affiliations
October 2008 - April 2017
October 2008 - November 2016
October 2008 - September 2015
Publications
Publications (153)
Network intrusion detection systems (NIDS) are commonly used to detect malware communications, including command-and-control (C2) traffic from botnets. NIDS performance assessments have been studied for decades, but mathematical modeling has rarely been used to explore NIDS performance. This paper details a mathematical model that describes a NIDS...
Finding $k$-cores in graphs is a valuable and effective strategy for extracting dense regions of otherwise sparse graphs. We focus on the important problem of maintaining cores on rapidly changing dynamic graphs, where batches of edge changes need to be processed quickly. Prior batch core algorithms have only addressed half the problem of maintaini...
Protecting against multi-step attacks of uncertain start times and duration forces the defenders into indefinite, always ongoing, resource-intensive response. To allocate resources effectively, the defender must analyze and respond to an uncertain stream of potentially undetected multiple multi-step attacks and take measures of attack and response...
Uncertainty sets are at the heart of robust optimization (RO) because they play a key role in determining the RO models' tractability, robustness, and conservativeness. Different types of uncertainty sets have been proposed that model uncertainty from various perspectives. Among them, polyhedral uncertainty sets are widely used due to their simplic...
Protecting against multi-step attacks of uncertain duration and timing forces defenders into an indefinite, always ongoing, resource-intensive response. To effectively allocate resources, a defender must be able to analyze multi-step attacks under assumption of constantly allocating resources against an uncertain stream of potentially undetected at...
With the advent of large-scale neuromorphic platforms, we seek to better understand the applications of neuromorphic computing to more general-purpose computing domains. Graph analysis problems have grown increasingly relevant in the wake of readily available massive data. We demonstrate that a broad class of combinatorial and graph problems known...
Finding the dense regions of a graph and relations among them is a fundamental problem in network analysis. Core and truss decompositions reveal dense subgraphs with hierarchical relations. The incremental nature of algorithms for computing these decompositions and the need for global information at each step of the algorithm hinders scalable paral...
Increasing penetration levels of renewables have transformed how power systems are operated. High levels of uncertainty in production make it increasingly difficulty to guarantee operational feasibility; instead, constraints may only be satisfied with high probability. We present a chance-constrained economic dispatch model that efficiently integra...
The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. Estimating the degree distribution of real-world graphs poses a signif...
The concept of k-cores is important for understanding the global structure of networks, as well as for identifying central or important nodes within a network. It is often valuable to understand the resilience of the k-cores of a network to attacks and dropped edges (i.e., damaged communications links). We provide a formal definition of a network»s...
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet advertising, and bioinformatics. However, most dense subgraph discovery algorithms are designed for cl...
Distributionally robust optimization (DRO) is widely used because it offers a way to overcome the conservativeness of robust optimization without requiring the specificity of stochastic programming. On the computational side, many practical DRO instances can be equivalently (or approximately) formulated as semidefinite programming (SDP) problems vi...
Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the...
The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on \emph{graph sampling}, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. The degree distribution estimation poses a significant challeng...
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to...
Inferring the exact topology of the interactions in a large, stochastic dynamical system from time-series data can often be prohibitive computationally and statistically without strong side information. One alternative is to seek approximations of the system topology that nonetheless describe the data well. In recent works, algorithms were proposed...
No matter how meticulously constructed, network datasets are often partially observed and incomplete. For example, most of the publicly available data from online social networking services (such as Facebook and Twitter) are collected via apps, users who make their accounts public, and/or the resources available to the researcher/practitioner. Such...
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known tha...
Finding the dense regions of a graph and relations among them is a fundamental task in network analysis. Nucleus decomposition is a principled framework of algorithms that generalizes the k-core and k-truss decompositions. It can leverage the higher-order structures to locate the dense subgraphs with hierarchical relations. Computation of the nucle...
We consider the problem of minimizing costs in the generation unit commitment
problem, a cornerstone in electric power system operations, while enforcing an
N-k-e reliability criterion. This reliability criterion is a generalization of
the well-known $N$-$k$ criterion, and dictates that at least $(1-e_ j)$
fraction of the total system demand must b...
Affiliation, or two-mode, networks, such as actor-movie, document-keyword, or user-product are prevalent in a lot of applications. The networks can be most naturally modeled as bipartite graphs, but most graph mining algorithms and implementations are designed to work on the classic, unipartite graphs. Subsequently, studies on affiliation networks...
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known tha...
Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (k-core, k-truss, and nucleus decomposition) have been eff...
Stochastic economic dispatch models address uncertainties in forecasts of renewable generation output by considering a finite number of realizations drawn from a stochastic process model, typically via Monte Carlo sampling. Accurate evaluations of expectations or higher-order moments for quantities of interest, e.g., generating cost, can require a...
Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (k-core, k-truss, and nucleus decomposition) have been eff...
Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of bipartite graphs, also known as two-way graphs. We propose two generative models that can be easily tuned to reproduce the characteristics of real-world networks, not just qualita...
Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of large-scale bipartite graphs, also known as two-way graphs or two-mode networks. We propose two generative models that can be easily tuned to reproduce the characteristics of real...
Networked representations of real-world phenomena are often partially
observed, which lead to incomplete networks. Analysis of such incomplete
networks can lead to skewed results. We examine the following problem: given an
incomplete network, which $b$ nodes should be probed to bring the largest
number of new nodes into the observed network? Many g...
Stochastic economic dispatch models address uncertainties in forecasts of
renewable generation output by considering a finite number of realizations
drawn from a stochastic process model, typically via Monte Carlo sampling.
Accurate evaluations of expectations or higher-order moments for quantities of
interest, e.g., generating cost, can require a...
Next generation architectures necessitate a shift away from traditional
workflows in which the simulation state is saved at prescribed frequencies for
post-processing analysis. While the need to shift to in~situ workflows has been
acknowledged for some time, much of the current research is focused on static
workflows, where the analysis that would...
Increasing complexity of scientific simulations and HPC architectures are
driving the need for adaptive workflows, where the composition and execution of
computational and data manipulation steps dynamically depend on the
evolutionary state of the simulation itself. Consider for example, the
frequency of data storage. Critical phases of the simulat...
We propose algorithms to approximate directed information graphs. Directed
information graphs are probabilistic graphical models that depict causal
dependencies between stochastic processes in a network. The proposed algorithms
identify optimal and near-optimal approximations in terms of Kullback-Leibler
divergence. The user-chosen sparsity trades...
Given two sets of vectors, $A = \{{a_1}, \dots, {a_m}\}$ and
$B=\{{b_1},\dots,{b_n}\}$, our problem is to find the top-$t$ dot products,
i.e., the largest $|{a_i}\cdot{b_j}|$ among all possible pairs. This is a
fundamental mathematical problem that appears in numerous data applications
involving similarity search, link prediction, and collaborative...
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex patterns is highly challenging, and there are few practical results known that can scale...
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify...
We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges...
Counting the frequency of small subgraphs is a fundamental technique in
network analysis across various domains, most notably in bioinformatics and
social networks. The special case of triangle counting has received much
attention. Getting results for 4-vertex patterns is highly challenging, and
there are few practical results known that can scale...
Finding dense substructures in a graph is a fundamental graph mining
operation, with applications in bioinformatics, social networks, and
visualization to name a few. Yet most standard formulations of this problem
(like clique, quasi-clique, k-densest subgraph) are NP-hard. Furthermore, the
goal is rarely to find the "true optimum", but to identify...
The study of sublinear algorithms is a recent development in theoretical computer science and discrete mathematics that has significant potential to provide scalable analysis algorithms for massive data. The approaches of sublinear algorithms address the fundamental mathematical problem of understanding global features of a data set using limited r...
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of such graphs. Some of the most
useful graph metrics are based on \emph{triangles}, such as those measuring
social cohesion. Despite the importance of these triadic measures, algorithms
to compute them can be extremely expens...
Stochastic unit commitment models typically handle uncertainties in forecast
demand by considering a finite number of realizations from a stochastic process
model for loads. Accurate evaluations of expectations or higher moments for the
quantities of interest require a prohibitively large number of model
evaluations. In this paper we propose an alt...
The K-core of a graph is the largest subgraph within which each node has at
least K connections. The key observation of this paper is that the K-core may
be much smaller than the original graph while retaining its community
structure. Building on this observation, we propose a framework that can
accelerate community detection algorithms by first fo...
Estimating the number of triangles in a graph given as a stream of edges is a
fundamental problem in data mining. The goal is to design a single pass
space-efficient streaming algorithm for estimating triangle counts. While there
are numerous algorithms for this problem, they all (implicitly or explicitly)
assume that the stream does not contain du...
First impressions from initial renderings of data are crucial for directing further exploration and analysis. In most visualization systems, default colormaps are generated by simply linearly interpolating color in some space based on a value's placement between the minimum and maximum taken on by the dataset. We design a simple sampling-based meth...
Understanding the dynamics of reciprocation is of great interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided unprecedented access to large-scale data which enables us to study such complex human behavior in a more systematic manner. In this paper, we consider three dif...
We propose two algorithms to identify approximations for joint distributions of networks of stochastic processes. The approximations correspond to low-complexity network structures - connected, directed graphs with bounded indegree. The first algorithm identifies an optimal approximation in terms of KL divergence. The second efficiently finds a nea...
We consider the problem of designing (or augmenting) an electric power system
at a minimum cost such that it satisfies the N-k-e survivability criterion.
This survivability criterion is a generalization of the well-known N-k
criterion, and it requires that at least (1- e_j) fraction of the total demand
to be met after failures of up to j components...
Understanding the dynamics of reciprocation is of great interest in sociology
and computational social science. The recent growth of Massively Multi-player
Online Games (MMOGs) has provided unprecedented access to large-scale data
which enables us to study such complex human behavior in a more systematic
manner. In this paper, we consider three dif...
Network data is ubiquitous and growing, yet we lack realistic generative
models that can be calibrated to match real-world data. The recently proposed
Block Two-Level Erdos-Renyi (BTER) model can be tuned to capture two
fundamental properties: degree distribution and clustering coefficients. The
latter is particularly important for reproducing grap...
The study of triangles in graphs is a standard tool in network analysis, leading to measures such as the transitivity, i.e., the fraction of paths of length two that participate in triangles. Real-world networks are often directed, and it
can be difficult to meaningfully understand this network structure. We propose a collection of directed closure...
The computation and study of triangles in graphs is a standard tool in
the analysis of real-world networks. Yet most of this work focuses on
undirected graphs. Real-world networks are often directed and have a
significant fraction of reciprocal edges. While there is much focus on
directed triadic patterns in the social sciences community, most data...
Graphs and networks are used to model interactions in a variety of contexts.
There is a growing need to quickly assess the characteristics of a graph in
order to understand its underlying structure. Some of the most useful metrics
are triangle-based and give a measure of the connectedness of mutual friends.
This is often summarized in terms of clus...
The problem of understanding reciprocation as a social behavior has always been of interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided a platform to study reciprocal behavior in a more systematic and large-scale manner. In this paper, we perform an extensive study abo...
We design a space efficient algorithm that approximates the transitivity
(global clustering coefficient) and total triangle count with only a single
pass through a graph given as a stream of edges. Our procedure is based on the
classic probabilistic result, the birthday paradox. When the transitivity is
constant and there are more edges than wedges...
Markov chains are convenient means of generating realizations of networks
with a given (joint or otherwise) degree distribution, since they simply
require a procedure for rewiring edges. The major challenge is to find the
right number of steps to run such a chain, so that we generate truly
independent samples. Theoretical bounds for mixing times of...
Triangles are an important building block and distinguishing feature of real-world networks, but their structure is still poorly understood. Despite numerous reports on the abundance of triangles, there is very little information on what these triangles look like. We initiate the study of degree-labeled triangles, - specifically, degree homogeneity...
Degree distributions are arguably the most important property of real world
networks. The classic edge configuration model or Chung-Lu model can generate
an undirected graph with any desired degree distribution. This serves as a good
null model to compare algorithms or perform experimental studies. Furthermore,
there are scalable algorithms that im...
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of a graph. Some of the most
useful graph metrics, especially those measuring social cohesion, are based on
triangles. Despite the importance of these triadic measures, associated
algorithms can be extremely expensive. We prop...
Markov chains are a convenient means of generating realizations of networks,
since they require little more than a procedure for rewiring edges. If a
rewiring procedure exists for generating new graphs with specified statistical
properties, then a Markov chain sampler can generate an ensemble of graphs with
prescribed characteristics. However, succ...
We consider the problem of designing (or augmenting) an electric power system
such that it satisfies the N-k-e survivability criterion while minimizing total
cost. The survivability criterion requires that at least (1-e) fraction of the
total demand can still be met even if any k (or fewer) of the system components
fail. We formulate this problem,...
The modeling flexibility provided by hypergraphs has drawn a lot of interest from the combinatorial scientific community, leading to novel models and algorithms, their applications, and development of associated tools. Hypergraphs are now a standard tool in combinatorial scientific computing. The modeling flexibility of hypergraphs, however, comes...
Efficient design of hardware and software for large-scale parallel execution requires detailed understanding of the interactions between the application, computer, and network. The authors have developed a macro-scale simulator (SST/macro) that permits the coarse-grained study of distributed-memory applications. In the presented work, applications...
Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such comm...
Graph analysis is playing an increasingly important role in science and industry. Due to numerous limitations in sharing real-world graphs, models for generating massive graphs are critical for developing better algorithms. In this paper, we analyze the stochastic Kronecker graph model (SKG), which is the foundation of the Graph500 supercomputer be...
The analysis of massive graphs is now becoming a very important part of
science and industrial research. This has led to the construction of a large
variety of graph models, each with their own advantages. The Stochastic
Kronecker Graph (SKG) model has been chosen by the Graph500 steering committee
to create supercomputer benchmarks for graph algor...
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics. For instance similarity between two papers can be based on common authors, where they are published, keyword similarity, citations, etc. As such, graphs with multiple edges is a more accurate mo...
We consider the problem of designing a network of minimum cost while satisfying a prescribed survivability criterion. The survivability criterion requires that a feasible flow must still exists (i.e. all demands can be satisfied without violating arc capacities) even after the disruption of a subset of the network's arcs. Specifically, we consider...
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured
in many different metrics, and so allowing graphs with multivariate edges significantly increases modeling power. In this
context the clustering problem becomes more challenging. Each edge/metric provides only partial inf...
One of the most influential recent results in network analysis is that many natural networks exhibit a power-law or log-normal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good mod...
We investigate the community detection problem on graphs in the existence of
multiple edge types. Our main motivation is that similarity between objects can
be defined by many different metrics and aggregation of these metrics into a
single one poses several important challenges, such as recovering this
aggregation function from ground-truth, inves...
The modeling flexibility provided by hypergraphs has drawn a lot of interest
from the combinatorial scientific community, leading to novel models and
algorithms, their applications, and development of associated tools.
Hypergraphs are now a standard tool in combinatorial scientific computing. The
modeling flexibility of hypergraphs however, comes a...
Graph analysis is playing an increasingly important role in science and
industry. Due to numerous limitations in sharing real-world graphs, models for
generating massive graphs are critical for developing better algorithms. In
this paper, we analyze the stochastic Kronecker graph model (SKG), which is the
foundation of the Graph500 supercomputer be...
One of the most influential results in network analysis is that many natural networks exhibit a power-law or log-normal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good models for...
Graph analysis is playing an increasingly important role in science and
industry. Due to numerous limitations in sharing real-world graphs, models for
generating massive graphs are critical for developing better algorithms. In
this paper, we analyze the stochastic Kronecker graph model (SKG), which is the
foundation of the Graph500 supercomputer be...
We study the 2-dimensional vector packing problem, which is a generalization of the classical bin packing problem where each item has 2 distinct weights and each bin has 2 corresponding capacities. The goal is to group items into minimum number of bins, without violating the bin capacity constraints. We propose a \Theta}(n)-time approximation algor...
Efficient design of hardware and software for large-scale parallel execution requires detailed understanding of the interactions between the application, computer, and network. The authors have developed a macro-scale simulator SST/macro that permits the coarse-grained study of distributed-memory applications. In the presented work, applications us...