Ali Pinar

Ali Pinar
  • PhD. Computer Science
  • Distinguished Member of Technical Staff at Sandia National Laboratories

About

153
Publications
25,753
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,714
Citations
Introduction
I work on various applications that involve graphs. Recently I have been working on modeling large scale graphs such as those of social networks, cyber networks. I have been also looking at power systems with focus on vulnerability analysis, and resilient network design. My early work was on traditional applications of combinatorial scientific computing such as sparse matrix computations and parallel computing
Current institution
Sandia National Laboratories
Current position
  • Distinguished Member of Technical Staff
Additional affiliations
October 2008 - April 2017
Sandia National Laboratories
Position
  • Distinguished Member of Technical Staff
October 2008 - November 2016
Sandia National Laboratories
Position
  • Principal Member of Technical Staff
October 2008 - September 2015
Sandia National Laboratories
Position
  • Member of Technical Staff

Publications

Publications (153)
Article
Network intrusion detection systems (NIDS) are commonly used to detect malware communications, including command-and-control (C2) traffic from botnets. NIDS performance assessments have been studied for decades, but mathematical modeling has rarely been used to explore NIDS performance. This paper details a mathematical model that describes a NIDS...
Preprint
Finding $k$-cores in graphs is a valuable and effective strategy for extracting dense regions of otherwise sparse graphs. We focus on the important problem of maintaining cores on rapidly changing dynamic graphs, where batches of edge changes need to be processed quickly. Prior batch core algorithms have only addressed half the problem of maintaini...
Article
Protecting against multi-step attacks of uncertain start times and duration forces the defenders into indefinite, always ongoing, resource-intensive response. To allocate resources effectively, the defender must analyze and respond to an uncertain stream of potentially undetected multiple multi-step attacks and take measures of attack and response...
Preprint
Full-text available
Uncertainty sets are at the heart of robust optimization (RO) because they play a key role in determining the RO models' tractability, robustness, and conservativeness. Different types of uncertainty sets have been proposed that model uncertainty from various perspectives. Among them, polyhedral uncertainty sets are widely used due to their simplic...
Preprint
Protecting against multi-step attacks of uncertain duration and timing forces defenders into an indefinite, always ongoing, resource-intensive response. To effectively allocate resources, a defender must be able to analyze multi-step attacks under assumption of constantly allocating resources against an uncertain stream of potentially undetected at...
Conference Paper
With the advent of large-scale neuromorphic platforms, we seek to better understand the applications of neuromorphic computing to more general-purpose computing domains. Graph analysis problems have grown increasingly relevant in the wake of readily available massive data. We demonstrate that a broad class of combinatorial and graph problems known...
Article
Finding the dense regions of a graph and relations among them is a fundamental problem in network analysis. Core and truss decompositions reveal dense subgraphs with hierarchical relations. The incremental nature of algorithms for computing these decompositions and the need for global information at each step of the algorithm hinders scalable paral...
Article
Full-text available
Increasing penetration levels of renewables have transformed how power systems are operated. High levels of uncertainty in production make it increasingly difficulty to guarantee operational feasibility; instead, constraints may only be satisfied with high probability. We present a chance-constrained economic dispatch model that efficiently integra...
Conference Paper
The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. Estimating the degree distribution of real-world graphs poses a signif...
Conference Paper
Full-text available
The concept of k-cores is important for understanding the global structure of networks, as well as for identifying central or important nodes within a network. It is often valuable to understand the resilience of the k-cores of a network to attacks and dropped edges (i.e., damaged communications links). We provide a formal definition of a network»s...
Conference Paper
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet advertising, and bioinformatics. However, most dense subgraph discovery algorithms are designed for cl...
Article
Full-text available
Distributionally robust optimization (DRO) is widely used because it offers a way to overcome the conservativeness of robust optimization without requiring the specificity of stochastic programming. On the computational side, many practical DRO instances can be equivalently (or approximately) formulated as semidefinite programming (SDP) problems vi...
Article
Full-text available
Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the...
Article
Full-text available
The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on \emph{graph sampling}, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. The degree distribution estimation poses a significant challeng...
Article
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to...
Article
Inferring the exact topology of the interactions in a large, stochastic dynamical system from time-series data can often be prohibitive computationally and statistically without strong side information. One alternative is to seek approximations of the system topology that nonetheless describe the data well. In recent works, algorithms were proposed...
Conference Paper
No matter how meticulously constructed, network datasets are often partially observed and incomplete. For example, most of the publicly available data from online social networking services (such as Facebook and Twitter) are collected via apps, users who make their accounts public, and/or the resources available to the researcher/practitioner. Such...
Conference Paper
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known tha...
Article
Full-text available
Finding the dense regions of a graph and relations among them is a fundamental task in network analysis. Nucleus decomposition is a principled framework of algorithms that generalizes the k-core and k-truss decompositions. It can leverage the higher-order structures to locate the dense subgraphs with hierarchical relations. Computation of the nucle...
Article
Full-text available
We consider the problem of minimizing costs in the generation unit commitment problem, a cornerstone in electric power system operations, while enforcing an N-k-e reliability criterion. This reliability criterion is a generalization of the well-known $N$-$k$ criterion, and dictates that at least $(1-e_ j)$ fraction of the total system demand must b...
Article
Full-text available
Affiliation, or two-mode, networks, such as actor-movie, document-keyword, or user-product are prevalent in a lot of applications. The networks can be most naturally modeled as bipartite graphs, but most graph mining algorithms and implementations are designed to work on the classic, unipartite graphs. Subsequently, studies on affiliation networks...
Article
Full-text available
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex or 5-vertex patterns is highly challenging, and there are few practical results known tha...
Article
Full-text available
Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (k-core, k-truss, and nucleus decomposition) have been eff...
Article
Full-text available
Stochastic economic dispatch models address uncertainties in forecasts of renewable generation output by considering a finite number of realizations drawn from a stochastic process model, typically via Monte Carlo sampling. Accurate evaluations of expectations or higher-order moments for quantities of interest, e.g., generating cost, can require a...
Preprint
Full-text available
Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (k-core, k-truss, and nucleus decomposition) have been eff...
Article
Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of bipartite graphs, also known as two-way graphs. We propose two generative models that can be easily tuned to reproduce the characteristics of real-world networks, not just qualita...
Preprint
Network science is a powerful tool for analyzing complex systems in fields ranging from sociology to engineering to biology. This paper is focused on generative models of large-scale bipartite graphs, also known as two-way graphs or two-mode networks. We propose two generative models that can be easily tuned to reproduce the characteristics of real...
Article
Full-text available
Networked representations of real-world phenomena are often partially observed, which lead to incomplete networks. Analysis of such incomplete networks can lead to skewed results. We examine the following problem: given an incomplete network, which $b$ nodes should be probed to bring the largest number of new nodes into the observed network? Many g...
Article
Full-text available
Stochastic economic dispatch models address uncertainties in forecasts of renewable generation output by considering a finite number of realizations drawn from a stochastic process model, typically via Monte Carlo sampling. Accurate evaluations of expectations or higher-order moments for quantities of interest, e.g., generating cost, can require a...
Article
Full-text available
Next generation architectures necessitate a shift away from traditional workflows in which the simulation state is saved at prescribed frequencies for post-processing analysis. While the need to shift to in~situ workflows has been acknowledged for some time, much of the current research is focused on static workflows, where the analysis that would...
Article
Full-text available
Increasing complexity of scientific simulations and HPC architectures are driving the need for adaptive workflows, where the composition and execution of computational and data manipulation steps dynamically depend on the evolutionary state of the simulation itself. Consider for example, the frequency of data storage. Critical phases of the simulat...
Article
Full-text available
We propose algorithms to approximate directed information graphs. Directed information graphs are probabilistic graphical models that depict causal dependencies between stochastic processes in a network. The proposed algorithms identify optimal and near-optimal approximations in terms of Kullback-Leibler divergence. The user-chosen sparsity trades...
Article
Full-text available
Given two sets of vectors, $A = \{{a_1}, \dots, {a_m}\}$ and $B=\{{b_1},\dots,{b_n}\}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}\cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative...
Conference Paper
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex patterns is highly challenging, and there are few practical results known that can scale...
Conference Paper
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify...
Article
We design a space-efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges...
Article
Full-text available
Counting the frequency of small subgraphs is a fundamental technique in network analysis across various domains, most notably in bioinformatics and social networks. The special case of triangle counting has received much attention. Getting results for 4-vertex patterns is highly challenging, and there are few practical results known that can scale...
Article
Full-text available
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify...
Article
The study of sublinear algorithms is a recent development in theoretical computer science and discrete mathematics that has significant potential to provide scalable analysis algorithms for massive data. The approaches of sublinear algorithms address the fundamental mathematical problem of understanding global features of a data set using limited r...
Article
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on \emph{triangles}, such as those measuring social cohesion. Despite the importance of these triadic measures, algorithms to compute them can be extremely expens...
Conference Paper
Full-text available
Stochastic unit commitment models typically handle uncertainties in forecast demand by considering a finite number of realizations from a stochastic process model for loads. Accurate evaluations of expectations or higher moments for the quantities of interest require a prohibitively large number of model evaluations. In this paper we propose an alt...
Article
Full-text available
The K-core of a graph is the largest subgraph within which each node has at least K connections. The key observation of this paper is that the K-core may be much smaller than the original graph while retaining its community structure. Building on this observation, we propose a framework that can accelerate community detection algorithms by first fo...
Article
Full-text available
Estimating the number of triangles in a graph given as a stream of edges is a fundamental problem in data mining. The goal is to design a single pass space-efficient streaming algorithm for estimating triangle counts. While there are numerous algorithms for this problem, they all (implicitly or explicitly) assume that the stream does not contain du...
Conference Paper
First impressions from initial renderings of data are crucial for directing further exploration and analysis. In most visualization systems, default colormaps are generated by simply linearly interpolating color in some space based on a value's placement between the minimum and maximum taken on by the dataset. We design a simple sampling-based meth...
Conference Paper
Full-text available
Understanding the dynamics of reciprocation is of great interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided unprecedented access to large-scale data which enables us to study such complex human behavior in a more systematic manner. In this paper, we consider three dif...
Conference Paper
We propose two algorithms to identify approximations for joint distributions of networks of stochastic processes. The approximations correspond to low-complexity network structures - connected, directed graphs with bounded indegree. The first algorithm identifies an optimal approximation in terms of KL divergence. The second efficiently finds a nea...
Article
Full-text available
We consider the problem of designing (or augmenting) an electric power system at a minimum cost such that it satisfies the N-k-e survivability criterion. This survivability criterion is a generalization of the well-known N-k criterion, and it requires that at least (1- e_j) fraction of the total demand to be met after failures of up to j components...
Article
Understanding the dynamics of reciprocation is of great interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided unprecedented access to large-scale data which enables us to study such complex human behavior in a more systematic manner. In this paper, we consider three dif...
Article
Full-text available
Network data is ubiquitous and growing, yet we lack realistic generative models that can be calibrated to match real-world data. The recently proposed Block Two-Level Erdos-Renyi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing grap...
Article
Full-text available
The study of triangles in graphs is a standard tool in network analysis, leading to measures such as the transitivity, i.e., the fraction of paths of length two that participate in triangles. Real-world networks are often directed, and it can be difficult to meaningfully understand this network structure. We propose a collection of directed closure...
Article
The computation and study of triangles in graphs is a standard tool in the analysis of real-world networks. Yet most of this work focuses on undirected graphs. Real-world networks are often directed and have a significant fraction of reciprocal edges. While there is much focus on directed triadic patterns in the social sciences community, most data...
Article
Full-text available
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are triangle-based and give a measure of the connectedness of mutual friends. This is often summarized in terms of clus...
Article
The problem of understanding reciprocation as a social behavior has always been of interest in sociology and computational social science. The recent growth of Massively Multi-player Online Games (MMOGs) has provided a platform to study reciprocal behavior in a more systematic and large-scale manner. In this paper, we perform an extensive study abo...
Article
Full-text available
We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges...
Article
Full-text available
Markov chains are convenient means of generating realizations of networks with a given (joint or otherwise) degree distribution, since they simply require a procedure for rewiring edges. The major challenge is to find the right number of steps to run such a chain, so that we generate truly independent samples. Theoretical bounds for mixing times of...
Conference Paper
Full-text available
Triangles are an important building block and distinguishing feature of real-world networks, but their structure is still poorly understood. Despite numerous reports on the abundance of triangles, there is very little information on what these triangles look like. We initiate the study of degree-labeled triangles, - specifically, degree homogeneity...
Article
Full-text available
Degree distributions are arguably the most important property of real world networks. The classic edge configuration model or Chung-Lu model can generate an undirected graph with any desired degree distribution. This serves as a good null model to compare algorithms or perform experimental studies. Furthermore, there are scalable algorithms that im...
Article
Full-text available
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We prop...
Conference Paper
Full-text available
Markov chains are a convenient means of generating realizations of networks, since they require little more than a procedure for rewiring edges. If a rewiring procedure exists for generating new graphs with specified statistical properties, then a Markov chain sampler can generate an ensemble of graphs with prescribed characteristics. However, succ...
Conference Paper
Full-text available
We consider the problem of designing (or augmenting) an electric power system such that it satisfies the N-k-e survivability criterion while minimizing total cost. The survivability criterion requires that at least (1-e) fraction of the total demand can still be met even if any k (or fewer) of the system components fail. We formulate this problem,...
Article
Full-text available
The modeling flexibility provided by hypergraphs has drawn a lot of interest from the combinatorial scientific community, leading to novel models and algorithms, their applications, and development of associated tools. Hypergraphs are now a standard tool in combinatorial scientific computing. The modeling flexibility of hypergraphs, however, comes...
Chapter
Efficient design of hardware and software for large-scale parallel execution requires detailed understanding of the interactions between the application, computer, and network. The authors have developed a macro-scale simulator (SST/macro) that permits the coarse-grained study of distributed-memory applications. In the presented work, applications...
Article
Full-text available
Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such comm...
Conference Paper
Full-text available
Graph analysis is playing an increasingly important role in science and industry. Due to numerous limitations in sharing real-world graphs, models for generating massive graphs are critical for developing better algorithms. In this paper, we analyze the stochastic Kronecker graph model (SKG), which is the foundation of the Graph500 supercomputer be...
Article
Full-text available
The analysis of massive graphs is now becoming a very important part of science and industrial research. This has led to the construction of a large variety of graph models, each with their own advantages. The Stochastic Kronecker Graph (SKG) model has been chosen by the Graph500 steering committee to create supercomputer benchmarks for graph algor...
Article
Full-text available
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics. For instance similarity between two papers can be based on common authors, where they are published, keyword similarity, citations, etc. As such, graphs with multiple edges is a more accurate mo...
Conference Paper
Full-text available
We consider the problem of designing a network of minimum cost while satisfying a prescribed survivability criterion. The survivability criterion requires that a feasible flow must still exists (i.e. all demands can be satisfied without violating arc capacities) even after the disruption of a subset of the network's arcs. Specifically, we consider...
Conference Paper
Full-text available
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics, and so allowing graphs with multivariate edges significantly increases modeling power. In this context the clustering problem becomes more challenging. Each edge/metric provides only partial inf...
Article
Full-text available
One of the most influential recent results in network analysis is that many natural networks exhibit a power-law or log-normal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good mod...
Conference Paper
Full-text available
We investigate the community detection problem on graphs in the existence of multiple edge types. Our main motivation is that similarity between objects can be defined by many different metrics and aggregation of these metrics into a single one poses several important challenges, such as recovering this aggregation function from ground-truth, inves...
Article
Full-text available
The modeling flexibility provided by hypergraphs has drawn a lot of interest from the combinatorial scientific community, leading to novel models and algorithms, their applications, and development of associated tools. Hypergraphs are now a standard tool in combinatorial scientific computing. The modeling flexibility of hypergraphs however, comes a...
Article
Full-text available
Graph analysis is playing an increasingly important role in science and industry. Due to numerous limitations in sharing real-world graphs, models for generating massive graphs are critical for developing better algorithms. In this paper, we analyze the stochastic Kronecker graph model (SKG), which is the foundation of the Graph500 supercomputer be...
Conference Paper
Full-text available
One of the most influential results in network analysis is that many natural networks exhibit a power-law or log-normal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good models for...
Article
Graph analysis is playing an increasingly important role in science and industry. Due to numerous limitations in sharing real-world graphs, models for generating massive graphs are critical for developing better algorithms. In this paper, we analyze the stochastic Kronecker graph model (SKG), which is the foundation of the Graph500 supercomputer be...
Article
Full-text available
We study the 2-dimensional vector packing problem, which is a generalization of the classical bin packing problem where each item has 2 distinct weights and each bin has 2 corresponding capacities. The goal is to group items into minimum number of bins, without violating the bin capacity constraints. We propose a \Theta}(n)-time approximation algor...
Article
Full-text available
Efficient design of hardware and software for large-scale parallel execution requires detailed understanding of the interactions between the application, computer, and network. The authors have developed a macro-scale simulator SST/macro that permits the coarse-grained study of distributed-memory applications. In the presented work, applications us...

Network

Cited By