
Cynthia A. PhillipsSandia National Laboratories · discrete math and complex systems
Cynthia A. Phillips
About
150
Publications
12,852
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,095
Citations
Publications
Publications (150)
This paper considers an extension of the shortest path network interdiction problem that incorporates robustness to account for parameter uncertainty. The shortest path interdiction problem is a game of two players with conflicting agendas and capabilities: an evader, who traverses the arcs of a network from a source node to a sink node using a pat...
We present a new algorithm, Fractional Decomposition Tree (FDT), for finding a feasible solution for an integer program (IP) where all variables are binary. FDT runs in polynomial time and is guaranteed to find a feasible integer solution provided the integrality gap of an instance’s polyhedron, independent of objective function, is bounded. The al...
Write-optimized data structures (WODS), offer the potential to keep up with cyberstream event rates and give sub-second query response for key items like IP addresses. These data structures organize logs as the events are observed. To work in a real-world environment and not fill up the disk, WODS must efficiently expire older events. As the basis...
Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature.
We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitt...
Motivated by the properties of unending real-world cybersecurity streams, we present a new graph streaming model: XStream. We maintain a streaming graph and its connected components at single-edge granularity. In cybersecurity graph applications, input streams typically consist of edge insertions; individual deletions are not explicit. Analysts mai...
A key strategy for protecting municipal water supplies is the use of sensors to detect the presence of contaminants in associated water distribution systems. Deploying a contamination warning system involves the placement of a limited number of sensors—placed in order to maximize the level of protection afforded. Researchers have proposed several m...
Boolean circuits of McCulloch-Pitts threshold gates are a classic model of neural computation studied heavily in the late 20th century as a model of general computation. Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two $N$...
We present a new algorithm, Fractional Decomposition Tree (FDT) for finding a feasible solution for an integer program (IP) where all variables are binary. FDT runs in polynomial time and is guaranteed to find a feasible integer solution provided the integrality gap is bounded. The algorithm gives a construction for Carr and Vempala's theorem that...
We study a trajectory analysis problem we call the Trajectory Capture Problem (TCP), in which, for a given input set ${\cal T}$ of trajectories in the plane, and an integer $k\geq 2$, we seek to compute a set of $k$ points (``portals'') to maximize the total weight of all subtrajectories of ${\cal T}$ between pairs of portals. This problem naturall...
Community detection in graphs is a canonical social network analysis method. We consider the problem of generating suites of teras-cale synthetic social networks to compare the solution quality of parallel community-detection methods. The standard method, based on the graph generator of Lancichinetti, Fortunato, and Radicchi (LFR), has been used ex...
With the advent of large-scale neuromorphic platforms, we seek to better understand the applications of neuromorphic computing to more general-purpose computing domains. Graph analysis problems have grown increasingly relevant in the wake of readily available massive data. We demonstrate that a broad class of combinatorial and graph problems known...
A key problem in social network analysis is to identify nonhuman interactions. State‐of‐the‐art bot‐detection systems like Botometer train machine‐learning models on user‐specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approa...
We use Bayesian data analysis to predict dengue fever outbreaks and quantify the link between outbreaks and meteorological precursors tied to the breeding conditions of vector mosquitos. We use Hamiltonian Monte Carlo sampling to estimate a seasonal Gaussian process modeling infection rate, and aperiodic basis coefficients for the rate of an “outbr...
Given a stream $S = (s_1, s_2, ..., s_N)$, a $\phi$-heavy hitter is an item $s_i$ that occurs at least $\phi N$ times in $S$. The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a $\phi$-event at time $t$ if $s_t$ occurs exactly $\phi N$ times...
Boolean circuits of McCulloch-Pitts threshold gates are a classic model of neural computation studied heavily in the late 20th century as a model of general computation. Recent advances in large-scale neural computing hardware has made their practical implementation a near-term possibility. We describe a theoretical approach for multiplying two N b...
We study several natural instances of the geometric hitting set problem for input consisting of sets of line segments (and rays, lines) having a small number of distinct slopes. These problems model path monitoring (e.g., on road networks) using the fewest sensors (the "hitting points"). We give approximation algorithms for cases including (i) line...
We consider underwater multi-modal wireless sensor networks (UWSNs) suitable for applications on submarine surveillance and monitoring, where nodes offload data to a mobile autonomous underwater vehicle (AUV) via optical technology, and coordinate using acoustic communication. Sensed data are associated with a value, decaying in time. In this scena...
The skip list is an elegant dictionary data structure that is commonly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(log N) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p.
A seemingly natural way to generalize the skip list to external mem...
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units,...
We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent...
We study several natural instances of the geometric hitting set problem for input consisting of sets of line segments (and rays, lines) having a small number of distinct slopes. These problems model path monitoring (e.g., on road networks) using the fewest sensors (the "hitting points"). We give approximation algorithms for cases including (i) line...
In recent work we quantified the anticipated performance boost when a sorting algorithm is modified to leverage user-addressable "near-memory," which we call scratchpad. This architectural feature is expected in the Intel Knight's Landing processors that will be used in DOE's next large-scale supercomputer.
This paper expands our analytical study o...
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search...
We study several natural instances of the geometric hitting set problem for input consisting of sets of line segments (and rays, lines) having a small number of distinct slopes. These problems model path monitoring (e.g., on road networks) using the fewest sensors (the “hitting points”). We give approximation algorithms for cases including (i) line...
Sensor mission assignment involves matching the sensing resources of a wireless sensor network (WSN) to appropriate tasks (missions), which may come to the network dynamically. Although solutions for WSNs with battery-operated nodes have been proposed for this problem, no attention has been given to networks whose nodes have energy-harvesting capab...
This paper considers underwater wireless sensor networks (UWSNs) for submarine surveillance and monitoring. Nodes produce data with an associated value, decaying in time. An autonomous underwater vehicle (AUV) is sent to retrieve information from the nodes, through optical communication, and periodically emerges to deliver the collected data to a s...
Triangle enumeration is a fundamental graph operation. Despite the lack of provably efficient (linear, or slightly super-linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs t...
Contamination warning systems (CWSs) for drinking water distribution systems (WDSs) are used to reduce the potential adverse effects of intentional or accidental WDS contamination. They are designed on the basis of possible contamination events but often address only a narrow range in event conditions. The influence on their performance of conditio...
We present an algorithm to maintain the connected components of a graph that arrives as an infinite stream of edges. We formalize the algorithm on X-stream, a new parallel theoretical computational model for infinite streams. Connectivity-related queries, including component spanning trees, are supported with some latency, returning the state of th...
Integrated Stockpile Evaluation (ISE) is a program to test nuclear weapons periodically. Tests are performed by machines that may require occasional calibration. These calibrations are expensive, so finding a schedule that minimizes calibrations allows more testing to be done for a given amount of money.
This paper introduces a theoretical framewor...
We design a network that supports a feasible multicommodity flow even after the failures of any k edges. We present a mixed-integer linear program (MILP), a cutting plane algorithm, and a column-and-cut algorithm. The algorithms add constraints to repair vulnerabilities in partial network designs. Empirical studies on previously unsolved instances...
Water distribution network models for large municipalities have tens of thousands of interconnecting pipes and junctions with complex hydraulic controls. Many water security applications, including sensor placement optimization, require detailed simulation of potential contamination incidents. The postsimulation optimization problem can easily exce...
Decision makers increasingly rely on large-scale computational models to simulate and analyze complex man-made systems. For example, computational models of national infrastructures are being used to inform government policy, assess economic and national security risks, evaluate infrastructure interdependencies, and plan for the growth and evolutio...
We address the problem of minimizing power consumption when broadcasting a message from one node to all the other nodes in a radio network. To enable power savings for such a problem, we introduce a compelling new data streaming problem which we call the Bad Santa problem. Our results on this problem apply for any situation where: (1) a node can li...
The volume of streaming data for cyber analysis is increasing at a rate much greater than any organization's ability to hire human analysts. As a preliminary step to automating significant portions of analysis workload, we consider the problem of modeling cyber data. Since the latter tends to be relational in nature, graphs are a natural abstractio...
A key aspect of Contamination Warning System design is the strategic placement of sensors throughout the distribution network.
There has been a large volume of research on this topic in the last several years, including a “Battle of the Water Sensor
Networks” (Ostfeld et al., 2008, Journal of Water Resources Planning and Management, 134(6), 556–568...
In this paper we present the impact of classical electronics constraints on a
solid-state quantum dot logical qubit architecture. Constraints due to routing
density, bandwidth allocation, signal timing, and thermally aware placement of
classical supporting electronics significantly affect the quantum error
correction circuit's error rate. We analyz...
We consider the design of a sensor network to serve as an early warning system against a potential suite of contamination incidents. Given any measure for evaluating the quality of a sensor placement, there are two ways to model the objective. One is to minimize the impact or damage to the network, the other is to maximize the reduction in impact c...
A commonly used indicator of water quality is the amount of residual chlorine in a water distribution system. Chlorine booster stations are often utilized to maintain acceptable levels of residual chlorine throughout the network. In addition, hyper-chlorination has been used to disinfect portions of the distribution system following a pipe break. C...
Communities of vertices within a giant network such as the World Wide Web are likely to be vastly smaller than the network itself. However, Fortunato and Barthélemy have proved that modularity maximization algorithms for community detection may fail to resolve communities with fewer than √L/2 edges, where L is the number of edges in the entire netw...
We define scalable models and distributed heuristics for the concurrent and coordinated movement of multiple sinks in a wireless sensor network, a case that presents significant challenges compared to the widely investigated case of a single mobile sink. Our objective is that of maximizing the network lifetime defined as the time from the start of...
Welcome to the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010). The technical program for the “main meeting” has 123 paper presentations in 32 sessions, one session with 4 best papers, 3 keynote addresses, 1 panel discussion, 2 evening tutorials, and two poster sessions for the PhD Forum. There is also our traditi...
Enumerating triangles (3-cycles) in graphs is a kernel operation for social network analysis. For example, many community detection methods depend upon finding common neighbors of two related entities. We consider Cohen's simple and elegant solution for listing triangles: give each node a 'bucket.' Place each edge into the bucket of its endpoint of...
We consider the problem of placing sensors in a municipal water network when we can choose both the location of sensors and the sensitivity and specificity of the contamination warning system. Sensor stations in a municipal water distribution network continuously send sensor output information to a centralized computing facility, and event detectio...
We present and analyze an architecture for a logical qubit memory that is tolerant of faults in the processing of sili- con double quantum dot (DQD) qubits. A highlight of our analysis is an in-depth consideration of the constraints faced when integrating DQDs with classical control electronics. Categories and Subject Descriptors: B.3.4 (Hardware):...
We consider the problem of designing a contaminant warning system for a municipal water distribution network that uses imperfect sensors, which can generate false-positive and false-negative detections. Although sensor placement optimization methods have been developed for contaminant warning systems, most sensor placement formulations assume perfe...
We describe a fault-tolerant memory for an error-corrected logical qubit based on silicon double quantum dot physical qubits. Our design accounts for constraints imposed by supporting classical electronics. A significant consequence of the constraints is to add error-prone idle steps for the physical qubits. Even using a schedule with provably mini...
The US Environmental Protection Agency (EPA) is the lead federal agency for the security of drinking water in the United States. The agency is responsible for providing information and technical assistance to the more than 50,000 water utilities across the country. The distributed physical layout of drinking-water utilities makes them inherently vu...
In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assum...
We propose scalable models and centralized heuristics for the concurrent and coordinated movement of multiple sinks in a wireless sensor network (WSN). The proposed centralized heuristic runs in polynomial time given the solution to the linear program and achieves results that are within 2% of the LP-relaxation-based upper bound. It provides a usef...
Following the events of September 11, 2001, in the United States, world public awareness for possible terrorist attacks on water supply systems has increased dramatically. Among the different threats for a water distribution system, the most difficult to address is a deliberate chemical or biological contaminant injection, due to both the uncertain...
We address the problem of minimizing power consumption when broadcasting a message from one node to all the other nodes in a radio network. To enable power savings for such a problem, we introduce a compelling new data streaming problem that we call the Bad Santa problem. Our results on this problem apply for any situation where: 1) a node can list...
We present the TEVA-SPOT Toolkit, a sensor placement optimization tool developed within the USEPA TEVA program. The TEVA-SPOT Toolkit provides a sensor placement framework that facilitates research in sensor placement optimization and enables the practical application of sensor placement solvers to real-world CWS design applications. This paper pro...
Placing sensors in municipal water networks to protect against a set of contami-nation events is a classic p-median problem for most objectives when we assume that sensors are perfect. Many researchers have proposed exact and approximate solution methods for this p-median formulation. For full-scale networks with large contamina-tion event suites,...
The general sensor placement problem (SPP) for contaminant warning system (CWS) design involves placement of a limited number of sensors such that the expected impact of an attack is minimized. We cast the SPP in terms of the well-known p-median problem from discrete location theory. The p-median formulation assumes a fixed number of attack scenari...
We give processor-allocation algorithms for grid architectures, where the objective is to select processors from a set of available processors to minimize the average number of communication hops.
The associated clustering problem is as follows: Given n points in ℜ
d
, find a size-k subset with minimum average pairwise L
1 distance. We present a na...
The practical utility of optimization technologies is often impacted by factors that reflect how these tools are used in practice, including whether various real-world constraints can be adequately modeled, the sophistication of the analysts applying the optimizer, and related environmental factors (e.g. whether a company is willing to trust predic...
We address the problem of minimizing power consumption when performing reliable broadcast on a radio network under the following popular model. Each node in the network is located on a point in a two dimensional grid, and whenever a node sends a message, all awake nodes within distance r receive the message. In the broadcast problem, some node want...
In this paper we apply theoretical and practical results from facility
location theory to the problem of community detection in networks. The result
is an algorithm that computes bounds on a minimization variant of local
modularity. We also define the concept of an edge support and a new measure of
the goodness of community structures with respect...
In this paper, we introduce EXACT, the EXperimental Algorithmics Computational Toolkit. EXACT is a software framework for describing, controlling, and analyzing computer experiments. It provides the experimentalist with convenient software tools to ease and organize the entire experimental process, including the description of factors and levels, t...
This paper addresses the problem of scheduling a DAG of unit-length tasks on asynchronous processors, that is, pro- cessors having different and changing speeds. The objec- tive is to minimize the makespan ,t hat is, the time to exe- cute the entire DAG. Asynchrony is modeled by an oblivi- ous adversary , which is assumed to determine the proces- s...
Sensor placement problems for municipal water distribution networks usually in-volve detecting a series of scenarios. The number of scenarios needed to accurately model a full set of possible events based on season, special events, and type of contami-nation can grow much faster than the size of the network. We introduce two new meth-ods for reduci...
from accidental or malicious contamination and sensor placement for intruder detection in transportation networks or buildings. We have addressed these problems using parallel integer programming. In this talk we present a parallelizable heuristic method for finding approximate solutions to general sensor placement problems. An Integer program (IP)...
We present a mixed-integer programming (MIP) formulation for sensor placement optimization in municipal water distribution systems that includes the temporal characteristics of contamination events and their impacts. Typical network water quality simulations track contaminant concentration and movement over time, computing contaminant concentration...
Large and/or computationally expensive optimization problems sometimes require parallel or high-performance computing systems
to achieve reasonable running times. This chapter gives an introduction to parallel computing for those familiar with serial
optimization. We present techniques to assist the posting of serial optimization codes to parallel...
We present a series of related robust optimization models for placing sensors in municipal water networks to detect contaminants that are maliciously or accidentally injected.We formulate sensor placement problems as mixed-integer programs, for which the objective coefficients are not known with certainty. We consider a restricted absolute robustne...
In the network inhibition problem, we wish to expend a limited budget attacking a given edge-capacitated graph by “paying” to remove edge capacity from some subset of the edges. We wish to minimize the resulting maximum flow between two designated vertices s and t. The problem is strongly NP-hard. Previous approximation algorithms applied only to p...
Inferring phylogenetic trees is a fundamental problem in computational-biology. We present a new objective criterion, the phylogenetic number, for evaluating evolutionary trees for species defined by biomolecular sequences or other qualitative characters. The phylogenetic number of a tree T is the maximum number of times that any given character st...
Cities without an early warning system of indwelling sensors can consider monitoring their networks manually, especially during times of heightened security levels. We consider the problem of calculating an optimal schedule for manual sampling in a municipal water network. Preliminary computations with a small-scale example indicate that during nor...
We consider the problem of optimally placing water quality sensors in municipal water networks under the assumption that sensors may fail. We give a non-linear formulation of the problem, then a linearization of this formulation in the form of a mixed-integer program (MIP). We explore the scalability limits of this for-mulation, then use it as a bo...
This paper considers the problem of optimally placing sen-sors in water networks to minimize the expected damage due to inten-tional or accidental contamination. For a simple initial model, we give a dynamic programming algorithm that finds an optimal sensor placement if the water network is a tree. However, generalizing to more realistic models or...
Mixed-integer programming (MIP), optimization of a linear function subject to linear and integrality constraints, is a standard technology for computing an efficient allocation of limited resources. In this chapter, we survey MIP applications at Sandia National Laboratories. We describe scalability features of the massively parallel MIP solver in P...
We give processor-allocation algorithms for grid architectures, where the objective is to select processors from a set of
available processors to minimize the average number of communication hops.
The associated clustering problem is as follows: Given n points in Rd\mathcal{R}^d, find a size-k subset with minimum average pairwise L
1 distance. We...
In recent years, several integer programming models have been proposed to place sensors in municipal water networks in order to detect intentional or accidental contamination. Although these initial models assumed that it is equally costly to place a sensor at any place in the network, there clearly are practical cost constraints that would impact...
We consider the accuracy of predictions made by integer programming (IP) models of sensor placement for water security applications. We have recently shown that IP models can be used to find optimal sensor placements for a variety of different performance criteria (e.g. minimize health impacts and minimize time to detection). However, these models...
Integer programming (IP) is a general optimization technology capable of expressing most resource allocation decisions. More specifically, IP is the optimization of a linear objective function subject to linear contraints and additional nonlinear integrality constraints. For sensor placement problems, discrete decision variables usually represent d...
We present a model for optimizing the placement of sensors in municipal water networks to detect maliciously-injected contaminants. An optimal sensor configuration minimizes the expected fraction of the population at risk. We formulate this problem as an integer program, which can be solved with generally available IP solvers. We find optimal senso...