Article

Bayonet: probabilistic inference for networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Network operators often need to ensure that important probabilistic properties are met, such as that the probability of network congestion is below a certain threshold. Ensuring such properties is challenging and requires both a suitable language for probabilistic networks and an automated procedure for answering probabilistic inference queries. We present Bayonet, a novel approach that consists of: (i) a probabilistic network programming language and (ii) a system that performs probabilistic inference on Bayonet programs. The key insight behind Bayonet is to phrase the problem of probabilistic network reasoning as inference in existing probabilistic languages. As a result, Bayonet directly leverages existing probabilistic inference systems and offers a flexible and expressive interface to operators. We present a detailed evaluation of Bayonet on common network scenarios, such as network congestion, reliability of packet delivery, and others. Our results indicate that Bayonet can express such practical scenarios and answer queries for realistic topology sizes (with up to 30 nodes).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The goal is to query for various properties about the network's behavior: for instance, the probability of a packet reaching the end of the network, or of a packet queue overflowing. This example task is inspired by prior work on using probabilistic programming languages to perform network verification [18,53]. The situation in Fig. 1 is a small illustrative example of packet arrival, but programs like it are extremely challenging for today's PPLs because they mix different kinds of program structure. ...
... We consider four benchmarks with complex conditional independence structures to illustrate the design space of MultiPPL programs. We draw on models in the domains of network analysis [18,25] and Bayesian networks [5,7]. (2) How does MultiPPL compare with contemporary PPLs in using exact and approximate inference with respect to wall-clock time and distance from the exact distribution? ...
... When a server receives an infected packet, it becomes compromised and can only propagate infected packets for the remainder of the evaluation. Taken from Gehr et al. [18], we sample time steps from a uniform distribution, step times, then query the model for the expected number of compromised servers. This evaluation poses an expressivity challenge to the Disc sub-language, which cannot define the dynamic-length FIFO queue without interoperation with Cont. ...
Preprint
Full-text available
There are many different probabilistic programming languages that are specialized to specific kinds of probabilistic programs. From a usability and scalability perspective, this is undesirable: today, probabilistic programmers are forced up-front to decide which language they want to use and cannot mix-and-match different languages for handling heterogeneous programs. To rectify this, we seek a foundation for sound interoperability for probabilistic programming languages: just as today's Python programmers can resort to low-level C programming for performance, we argue that probabilistic programmers should be able to freely mix different languages for meeting the demands of heterogeneous probabilistic programming environments. As a first step towards this goal, we introduce \textsc{MultiPPL}, a probabilistic multi-language that enables programmers to interoperate between two different probabilistic programming languages: one that leverages a high-performance exact discrete inference strategy, and one that uses approximate importance sampling. We give a syntax and semantics for \textsc{MultiPPL}, prove soundness of its inference algorithm, and provide empirical evidence that it enables programmers to perform inference on complex heterogeneous probabilistic programs and flexibly exploits the strengths and weaknesses of two languages simultaneously.%
... Although programming languages for describing randomized networks exist [13,17], support for automated reasoning remains limited. Even basic properties require quantitative reasoning in the probabilistic setting, and seemingly simple programs can generate complex distributions. ...
... To evaluate the scalability of McNetKAT, we conducted experiments on realistic topologies, routing schemes, and properties. Our results show that McNetKAT scales to networks with thousands of switches, and performs orders of magnitude better than a state-of-the-art tool based on general-purpose symbolic inference [17,18]. We also used McNetKAT to carry out a case study of the resilience of a fault-tolerant data center design proposed by Liu et al. [34]. ...
... To evaluate McNetKAT we conducted experiments on several benchmarks including a family of real-world data center topologies and a synthetic benchmark drawn from the literature [17]. We evaluated McNetKAT's scalability, characterized the effect of optimizations, and compared performance against other state-of-the-art tools. ...
Conference Paper
This paper presents McNetKAT, a scalable tool for verifying probabilistic network programs. McNetKAT is based on a new semantics for the guarded and history-free fragment of Probabilistic NetKAT in terms of finite-state, absorbing Markov chains. This view allows the semantics of all programs to be computed exactly, enabling construction of an automatic verification tool. Domain-specific optimizations and a parallelizing backend enable McNetKAT to analyze networks with thousands of nodes, automatically reasoning about general properties such as probabilistic program equivalence and refinement, as well as networking properties such as resilience to failures. We evaluate McNetKAT’s scalability using real-world topologies, compare its performance against state-of-the-art tools, and develop an extended case study on a recently proposed data center network design.
... Although programming languages for describing randomized networks exist [13,17], support for automated reasoning remains limited. Even basic properties require quantitative reasoning in the probabilistic setting, and seemingly simple programs can generate complex distributions. ...
... To evaluate the scalability of McNetKAT, we conducted experiments on realistic topologies, routing schemes, and properties. Our results show that McNetKAT scales to networks with thousands of switches, and performs orders of magnitude better than a state-of-the-art tool based on general-purpose symbolic inference [17,18]. We also used McNetKAT to carry out a case study of the resilience of a fault-tolerant data center design proposed by Liu et al. [34]. ...
... To evaluate McNetKAT we conducted experiments on several benchmarks including a family of real-world data center topologies and a synthetic benchmark drawn from the literature [17]. We evaluated McNetKAT's scalability, characterized the effect of optimizations, and compared performance against other state-of-the-art tools. ...
Preprint
This paper presents McNetKAT, a scalable tool for verifying probabilistic network programs. McNetKAT is based on a new semantics for the guarded and history-free fragment of Probabilistic NetKAT in terms of finite-state, absorbing Markov chains. This view allows the semantics of all programs to be computed exactly, enabling construction of an automatic verification tool. Domain-specific optimizations and a parallelizing backend enable McNetKAT to analyze networks with thousands of nodes, automatically reasoning about general properties such as probabilistic program equivalence and refinement, as well as networking properties such as resilience to failures. We evaluate McNetKAT's scalability using real-world topologies, compare its performance against state-of-the-art tools, and develop an extended case study on a recently proposed data center network design.
... Secondly, an attacker could obtain the communication scheduling policy via various methods. For instance, there has been extensive research on scheduling policy inference from measurement data, using deep learning [41] or probabilistic inference [42]. An attacker could also compromise a satellite (possibly even an outdated or decommissioned one), a ground station, or even a cloud-hosted copy of the satellite communication codebase, to uncover the scheduling policy. ...
Preprint
Low Earth Orbit (LEO) Earth Observation (EO) satellites have changed the way we monitor Earth. Acting like moving cameras, EO satellites are formed in constellations with different missions and priorities, and capture vast data that needs to be transmitted to the ground for processing. However, EO satellites have very limited downlink communication capability, limited by transmission bandwidth, number and location of ground stations, and small transmission windows due to high velocity satellite movement. To optimize resource utilization, EO constellations are expected to share communication spectrum and ground stations for maximum communication efficiency. In this paper, we investigate a new attack surface exposed by resource competition in EO constellations, targeting the delay or drop of Earth monitoring data using legitimate EO services. Specifically, an attacker can inject high-priority requests to temporarily preempt low-priority data transmission windows. Furthermore, we show that by utilizing predictable satellite dynamics, an attacker can intelligently target critical data from low-priority satellites, either delaying its delivery or irreversibly dropping the data. We formulate two attacks, the data delay attack and the data overflow attack, design algorithms to assist attackers in devising attack strategies, and analyze their feasibility or optimality in typical scenarios. We then conduct trace-driven simulations using real-world satellite images and orbit data to evaluate the success probability of launching these attacks under realistic satellite communication settings. We also discuss possible defenses against these attacks.
... Conditional expectations convergence theorems have many applications in probability theory, data science, economics, finance, statistics and give alternative proofs of theorems such as Kolmogorov large-numbers, Levy generalized Borel-Cantelli, Radon-Nikodym. For potential applications, see [6] and [5].We would like to remark that the existence of convergence a.e. provides the possibility of using tools not necessarily available in a given situation. ...
Article
Full-text available
A necessary and sufficient condition on a sequence {An}nN{\{{\mathcal{A}_{n}}\}_{n\in \mathbb{N}}} of σ-subalgebras which assures convergence almost everywhere of conditional expectations for functions in L{L^{\infty }} is given. It is proven that for fL(A)f\in {L^{\infty }}(\mathcal{A}) E(fAn)a.e.E(fAμa.e.). \mathsf{E}(f|{\mathcal{A}_{n}})\stackrel{a.e.}{\longrightarrow }\mathsf{E}(f|{\mathcal{A}_{\mu a.e.}}).
... Software systems involving probability are pervasive. Such systems naturally appear in diverse domains such as network reliability analysis [Gehr et al. 2018;Smolka et al. 2019], reliability for cyberphysical systems [Holtzen et al. 2021;Lee and Seshia 2016], distributed software systems [Tassarotti and Harper 2019], and many others. As these systems are increasingly deployed in high-consequence domains, there is a growing need for formal frameworks capable of reasoning about and verifying probabilistic correctness properties. ...
Preprint
Full-text available
We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure.
... More broadly, model and reasoning about probabilistic programs has been extensively studied since the 80s [Kozen 1981[Kozen , 1985McIver and Morgan 2005;Sharir et al. 1984]. Recent work on modeling and analyzing probabilistic networks [Foster et al. 2016;Gehr et al. 2018;Smolka et al. 2019] can be viewed as reasoning systems for probabilistic message passing systems. However, the work on networks focuses on finite state systems and global properties. ...
Preprint
Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To model and analyze such systems, this article introduces probabilistic session types and explores their application in automatic expected resource analysis. Probabilistic session types describe probability distributions over messages and are a conservative extension of intuitionistic (binary) session types. To send on a probabilistic channel, processes have to utilize internal randomness from a probabilistic branching expression or external randomness from receiving on a probabilistic channel. The analysis for expected resource bounds is integrated with the type system and is a variant of automatic amortized resource analysis. It can automatically derive symbolic bounds for different cost metrics by reducing type inference to linear constraint solving. The technical contributions include the meta theory that is based on a novel nested multiverse semantics and a type-reconstruction algorithm that allows flexible mixing of different sources of randomness without burdening the programmer with type annotations. The type system has been implemented in the language PRast. Experiments demonstrate that PRast is applicable in different domains such as resource analysis of randomized distributed algorithms, verification of limiting distributions in Markov chains, and analysis of probabilistic digital contracts.
... To illustrate the benefits of functional abstraction, we adapt an example from recent work in probabilistic verification of computer networks via probabilistic programs [34]. Figure 2a shows a "diamond" network that contains four servers, labeled S i . ...
Preprint
Probabilistic programming languages (PPLs) are an expressive means of representing and reasoning about probabilistic models. The computational challenge of probabilistic inference remains the primary roadblock for applying PPLs in practice. Inference is fundamentally hard, so there is no one-size-fits all solution. In this work, we target scalable inference for an important class of probabilistic programs: those whose probability distributions are discrete. Discrete distributions are common in many fields, including text analysis, network verification, artificial intelligence, and graph analysis, but they prove to be challenging for existing PPLs. We develop a domain-specific probabilistic programming language called Dice that features a new approach to exact discrete probabilistic program inference. Dice exploits program structure in order to factorize inference, enabling us to perform exact inference on probabilistic programs with hundreds of thousands of random variables. Our key technical contribution is a new reduction from discrete probabilistic programs to weighted model counting (WMC). This reduction separates the structure of the distribution from its parameters, enabling logical reasoning tools to exploit that structure for probabilistic inference. We (1) show how to compositionally reduce Dice inference to WMC, (2) prove this compilation correct with respect to a denotational semantics, (3) empirically demonstrate the performance benefits over prior approaches, and (4) analyze the types of structure that allowDice to scale to large probabilistic programs.
Article
There are many different probabilistic programming languages that are specialized to specific kinds of probabilistic programs. From a usability and scalability perspective, this is undesirable: today, probabilistic programmers are forced up-front to decide which language they want to use and cannot mix-and-match different languages for handling heterogeneous programs. To rectify this, we seek a foundation for sound interoperability for probabilistic programming languages: just as today’s Python programmers can resort to low-level C programming for performance, we argue that probabilistic programmers should be able to freely mix different languages for meeting the demands of heterogeneous probabilistic programming environments. As a first step towards this goal, we introduce MultiPPL, a probabilistic multi-language that enables programmers to interoperate between two different probabilistic programming languages: one that leverages a high-performance exact discrete inference strategy, and one that uses approximate importance sampling. We give a syntax and semantics for MultiPPL, prove soundness of its inference algorithm, and provide empirical evidence that it enables programmers to perform inference on complex heterogeneous probabilistic programs and flexibly exploits the strengths and weaknesses of two languages simultaneously.
Article
We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure.
Article
Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To implement and analyze such systems, this article develops the meta theory of probabilistic session types with an application focus on automatic expected resource analysis. Probabilistic session types describe probability distributions over messages and are a conservative extension of intuitionistic (binary) session types. To send on a probabilistic channel, processes have to utilize internal randomness from a probabilistic branching or external randomness from receiving on a probabilistic channel. The analysis for expected resource bounds is smoothly integrated with the type system and is a variant of automatic amortized resource analysis. Type inference relies on linear constraint solving to automatically derive symbolic bounds for various cost metrics. The technical contributions include the meta theory that is based on a novel nested multiverse semantics and a type-reconstruction algorithm that allows flexible mixing of different sources of randomness without burdening the programmer with complex type annotations. The type system has been implemented in the language NomosPro with linear-time type checking. Experiments demonstrate that NomosPro is applicable in different domains such as cost analysis of randomized distributed algorithms, analysis of Markov chains, probabilistic analysis of amortized data structures and digital contracts. NomosPro is also shown to be scalable by (i) implementing two broadcast and a bounded retransmission protocol where messages are dropped with a fixed probability, and (ii) verifying the limiting distribution of a Markov chain with 64 states and 420 transitions.
Article
ProbNV is a new framework for probabilistic network control plane verification that strikes a balance between generality and scalability. ProbNV is general enough to encode a wide range of features from the most common protocols (eBGP and OSPF) and yet scalable enough to handle challenging properties, such as probabilistic all-failures analysis of medium-sized networks with 100-200 devices. When there are a small, bounded number of failures, networks with up to 500 devices may be verified in seconds. ProbNV operates by translating raw CISCO configurations into a probabilistic and functional programming language designed for network verification. This language comes equipped with a novel type system that characterizes the sort of representation to be used for each data structure: concrete for the usual representation of values; symbolic for a BDD-based representation of sets of values; and multi-value for an MTBDD-based representation of values that depend upon symbolics. Careful use of these varying representations speeds execution of symbolic simulation of network models. The MTBDD-based representations are also used to calculate probabilistic properties of network models once symbolic simulation is complete. We implement the language and evaluate its performance on benchmarks constructed from real network topologies and synthesized routing policies.
Article
Probabilistic programming languages (PPLs) are an expressive means of representing and reasoning about probabilistic models. The computational challenge of probabilistic inference remains the primary roadblock for applying PPLs in practice. Inference is fundamentally hard, so there is no one-size-fits all solution. In this work, we target scalable inference for an important class of probabilistic programs: those whose probability distributions are discrete . Discrete distributions are common in many fields, including text analysis, network verification, artificial intelligence, and graph analysis, but they prove to be challenging for existing PPLs. We develop a domain-specific probabilistic programming language called Dice that features a new approach to exact discrete probabilistic program inference. Dice exploits program structure in order to factorize inference, enabling us to perform exact inference on probabilistic programs with hundreds of thousands of random variables. Our key technical contribution is a new reduction from discrete probabilistic programs to weighted model counting (WMC). This reduction separates the structure of the distribution from its parameters, enabling logical reasoning tools to exploit that structure for probabilistic inference. We (1) show how to compositionally reduce Dice inference to WMC, (2) prove this compilation correct with respect to a denotational semantics, (3) empirically demonstrate the performance benefits over prior approaches, and (4) analyze the types of structure that allow Dice to scale to large probabilistic programs.
Article
BGP is the de-facto Internet routing protocol for exchanging prefix reachability information between Autonomous Systems (AS). It is a dynamic, distributed, path-vector protocol that enables rich expressions of network policies (typically treated as secrets). In this regime, where complexity is interwoven with information hiding, answering questions such as "what is the expected catchment of the anycast sites of a content provider on the AS-level, if new sites are deployed?", or "how will load-balancing behave if an ISP changes its routing policy for a prefix?", is a hard challenge. In this work, we present a formal model and methodology that takes into account policy-based routing and topological properties of the Internet graph, to predict the routing behavior of networks. We design algorithms that provide new capabilities for informative route inference (e.g., isolating the effect of randomness that is present in prior simulation-based approaches). We analyze the properties of these inference algorithms, and evaluate them using publicly available routing datasets and real-world experiments. The proposed framework can be useful in a number of applications: measurements, traffic engineering, network planning, Internet routing models, etc. As a use case, we study the problem of selecting a set of measurement vantage points to maximize route inference. Our methodology is general and can capture standard valley-free routing, as well as more complex topological and routing setups appearing in practice.
Article
Full-text available
We propose trace abstraction modulo probability, a proof technique for verifying high-probability accuracy guarantees of probabilistic programs. Our proofs overapproximate the set of program traces using failure automata, finite-state automata that upper bound the probability of failing to satisfy a target specification. We automate proof construction by reducing probabilistic reasoning to logical reasoning: we use program synthesis methods to select axioms for sampling instructions, and then apply Craig interpolation to prove that traces fail the target specification with only a small probability. Our method handles programs with unknown inputs, parameterized distributions, infinite state spaces, and parameterized specifications. We evaluate our technique on a range of randomized algorithms drawn from the differential privacy literature and beyond. To our knowledge, our approach is the first to automatically establish accuracy properties of these algorithms.
Conference Paper
Full-text available
Many applications compute with estimated and uncertain data. While advances in probabilistic programming help developers build such applications, debugging them remains extremely challenging. New types of errors in probabilistic programs include 1) ignoring dependencies and correlation between random variables and in training data, 2) poorly chosen inference hyper-parameters, and 3) incorrect statistical models. A partial solution to prevent these errors in some languages forbids developers from explicitly invoking inference. While this prevents some dependence errors, it limits composition and control over inference, and does not guarantee absence of other types of errors. This paper presents the FLEXI programming model which supports constructs for invoking inference in the language and reusing the results in other statistical computations. We define a novel formalism for inference with a Decorated Bayesian Network and present a tool, DePP, that analyzes this representation to identify the above errors. We evaluate DePP on a range of prototypical examples to show how it helps developers to detect errors.
Conference Paper
Full-text available
We present Hakaru, a new probabilistic programming system that allows composable reuse of distributions, queries, and inference algorithms, all expressed in a single language of measures. The system implements two automatic and semantics-preserving program transformations—disintegration, which calculates conditional distributions, and simplification, which subsumes exact inference by computer algebra. We show how these features work together by describing the ideal workflow of a Hakaru user on two small problems. We highlight our composition of transformations and types in design and implementation.
Conference Paper
Full-text available
Regression formulas are a domain-specific language adopted by several R packages for describing an important and useful class of statistical models: hierarchical linear regressions. Formulas are succinct, expressive, and clearly popular, so are they a useful addition to probabilistic programming languages? And what do they mean? We propose a core calculus of hierarchical linear regression, in which regression coefficients are themselves defined by nested regressions (unlike in R).We explain how our calculus captures the essence of the formula DSL found in R. We describe the design and implementation of Fabular, a version of the Tabular schema-driven probabilistic programming language, enriched with formulas based on our regression calculus. To the best of our knowledge, this is the first formal description of the core ideas of R's formula notation, the first development of a calculus of regression formulas, and the first demonstration of the benefits of composing regression formulas and latent variables in a probabilistic programming language.
Article
Full-text available
This tutorial paper surveys the main features of Uppaal SMC, a model checking approach in Uppaal family that allows us to reason on networks of complex real-timed systems with a stochastic semantic. We demonstrate the modeling features of the tool, new verification algorithms and ways of applying them to potentially complex case studies.
Article
Full-text available
Monitoring the flow of traffic along network paths is essential for SDN programming and troubleshooting. For example, traffic engineering requires measuring the ingress-egress traffic matrix; debugging a congested link requires determining the set of sources sending traffic through that link; and locating a faulty device might involve detecting how far along a path the traffic makes progress. Past path-based monitoring systems operate by diverting packets to collectors that perform "after-the-fact" analysis, at the expense of large data-collection overhead. In this paper, we show how to do more efficient "during-the-fact" analysis. We introduce a query language that allows each SDN application to specify queries independently of the forwarding state or the queries of other applications. The queries use a regular-expression-based path language that includes SQL-like "groupby" constructs for count aggregation. We track the packet trajectory directly on the data plane by converting the regular expressions into an automaton, and tagging the automaton state (i.e., the path prefix) in each packet as it progresses through the network. The SDN policies that implement the path queries can be combined with arbitrary packet-forwarding policies supplied by other elements of the SDN platform. A preliminary evaluation of our prototype shows that our "during-the-fact" strategy reduces data-collection overhead over "after-the-fact" strategies.
Article
Full-text available
We describe Venture, an interactive virtual machine for probabilistic programming that aims to be sufficiently expressive, extensible, and efficient for general-purpose use. Like Church, probabilistic models and inference problems in Venture are specified via a Turing-complete, higher-order probabilistic language descended from Lisp. Unlike Church, Venture also provides a compositional language for custom inference strategies built out of scalable exact and approximate techniques. We also describe four key aspects of Venture's implementation that build on ideas from probabilistic graphical models. First, we describe the stochastic procedure interface (SPI) that specifies and encapsulates primitive random variables. The SPI supports custom control flow, higher-order probabilistic procedures, partially exchangeable sequences and ``likelihood-free'' stochastic simulators. It also supports external models that do inference over latent variables hidden from Venture. Second, we describe probabilistic execution traces (PETs), which represent execution histories of Venture programs. PETs capture conditional dependencies, existential dependencies and exchangeable coupling. Third, we describe partitions of execution histories called scaffolds that factor global inference problems into coherent sub-problems. Finally, we describe a family of stochastic regeneration algorithms for efficiently modifying PET fragments contained within scaffolds. Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically. We show how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques.
Article
Full-text available
Network service recovery from multiple correlated failures is a major concern given the increased level of infrastructure vulnerability to natural disasters, massive power failures, and malicious attacks. To properly address this problem, a novel path protection solution is proposed to jointly incorporate traffic engineering and risk minimization objectives. The framework assumes probabilistic link failures and is evaluated against some existing multi-failure recovery schemes using network simulation.
Conference Paper
Full-text available
The emergence of OpenFlow-capable switches enables exciting new network functionality, at the risk of programming errors that make communication less reliable. The centralized programming model, where a single con-troller program manages the network, seems to reduce the likelihood of bugs. However, the system is inherently distributed and asynchronous, with events happening at different switches and end hosts, and inevitable delays affecting communication with the controller. In this paper, we present efficient, systematic techniques for test-ing unmodified controller programs. Our NICE tool applies model checking to explore the state space of the entire system—the controller, the switches, and the hosts. Scalability is the main challenge, given the diversity of data packets, the large system state, and the many possible event orderings. To address this, we propose a novel way to augment model checking with symbolic execution of event handlers (to identify representative packets that exercise code paths on the controller). We also present a simplified OpenFlow switch model (to reduce the state space), and effective strategies for generating event interleavings likely to uncover bugs. Our prototype tests Python applications on the popular NOX platform. In testing three real applications—a MAC-learning switch, in-network server load balancing, and energy-efficient traffic engineering—we uncover eleven bugs.
Article
Full-text available
Managing a network requires support for multiple con- current tasks, from routing and traffic monitoring, to ac- cess control and server load balancing. Software-Defined Networking (SDN) allows applications to realize these tasks directly, by installing packet-processing rules on switches. However, today's SDN platforms provide lim- ited support for creating modular applications. This pa- per introduces new abstractions for building applications out of multiple, independent modules that jointly man- age network traffic. First, we define composition opera- tors and a library of policies for forwarding and querying traffic. Our parallel composition operator allows multi- ple policies to operate on the same set of packets, while a novel sequential composition operator allows one policy to process packets after another. Second, we enable each policy to operate on an abstract topology that implic- itly constrains what the module can see and do. Finally, we define a new abstract packet model that allows pro- grammers to extend packets with virtual fields that may be used to associate packets with high-level meta-data. We realize these abstractions in Pyretic, an imperative, domain-specific language embedded in Python.
Conference Paper
Full-text available
Satisfiability Modulo Theories (SMT) problem is a decision problem for logical first order formulas with respect to combinations of background theories such as: arithmetic, bit-vectors, arrays, and uninterpreted functions. Z3 is a new and efficient SMT Solver freely available from Microsoft Research. It is used in various software verification and analysis applications.
Conference Paper
Full-text available
Traffic classification and identification is a fertile research area. Beyond Quality of Service, service differentiation, and billing, one of the most important applications of traffic classification is in the field of network security. This paper proposes a packet-level traffic classification approach based on Hidden Markov Model (HMM). Classification is performed by using real network traffic and estimating - in a combined fashion - Packet Size (PS) and Inter Packet Time (IPT) characteristics, thus remaining applicable to encrypted traffic too. The effectiveness of the proposed approach is evaluated by considering several traffic typologies: we applied our model to real traffic traces of Age of Mythology and Counter Strike (two Multi Player Network Games), HTTP, SMTP, Edonkey, PPlive (a peer-to-peer IPTV application), and MSN Messenger. An analytical basis and the mathematical details regarding the model are given. Results show how the proposed approach is able to classify network traffic by using packet-level statistical properties and therefore it is a good candidate as a component for a multi-classification framework.
Article
Full-text available
The study of network topology has attracted a great deal of attention in the last decade, but has been hampered by a lack of accurate data. Existing methods for measuring topology have flaws, and arguments about the importance of these have overshadowed the more interesting questions about network structure. The Internet Topology Zoo is a store of network data created from the information that network operators make public. As such it is the most accurate large-scale collection of network topologies available, and includes meta-data that couldn't have been measured. With this data we can answer questions about network structure with more certainty than ever before - we illustrate its power through a preliminary analysis of the PoP-level topology of over 140 networks. We find a wide range of network designs not conforming as a whole to any obvious model.
Conference Paper
Full-text available
Accurate traffic classification is of fundamental importance to numerous other network activities, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning. We apply a Naïve Bayes estimator to categorize traffic by application. Uniquely, our work capitalizes on hand-classified network data, using it as input to a supervised Naïve Bayes estimator. In this paper we illustrate the high level of accuracy achievable with the \Naive Bayes estimator. We further illustrate the improved accuracy of refined variants of this estimator.Our results indicate that with the simplest of Naïve Bayes estimator we are able to achieve about 65% accuracy on per-flow classification and with two powerful refinements we can improve this value to better than 95%; this is a vast improvement over traditional techniques that achieve 50--70%. While our technique uses training data, with categories derived from packet-content, all of our training and testing was done using header-derived discriminators. We emphasize this as a powerful aspect of our approach: using samples of well-known traffic to allow the categorization of traffic using commonly available information alone.
Conference Paper
Full-text available
We present a constraint-based method for automatically generating quantitative invariants for linear probabilistic programs, and we show how it can be used, in combination with proof-based methods, to verify properties of probabilistic programs that cannot be analysed using existing automated methods. To our knowledge, this is the first automated method proposed for quantitative-invariant generation.
Article
We present a theory for slicing probabilistic imperative programs -- containing random assignments, and ``observe'' statements (for conditioning) -- represented as probabilistic control-flow graphs (pCFGs) whose nodes modify probability distributions. We show that such a representation allows direct adaptation of standard machinery such as data and control dependence, postdominators, relevant variables, etc to the probabilistic setting. We separate the specification of slicing from its implementation: first we develop syntactic conditions that a slice must satisfy; next we prove that any such slice is semantically correct; finally we give an algorithm to compute the least slice. To generate smaller slices, we may in addition take advantage of knowledge that certain loops will terminate (almost) always. A key feature of our syntactic conditions is that they involve two disjoint slices such that the variables of one slice are probabilistically independent of the variables of the other. This leads directly to a proof of correctness of probabilistic slicing. In a companion article we show adequacy of the semantics of pCFGs with respect to the standard semantics of structured probabilistic programs.
Article
With the range and sensitivity of algorithmic decisions expanding at a break-neck speed, it is imperative that we aggressively investigate fairness and bias in decision-making programs. First, we show that a number of recently proposed formal definitions of fairness can be encoded as probabilistic program properties. Second, with the goal of enabling rigorous reasoning about fairness, we design a novel technique for verifying probabilistic properties that admits a wide class of decision-making programs. Third, we present FairSquare, the first verification tool for automatically certifying that a program meets a given fairness property. We evaluate FairSquare on a range of decision-making programs. Our evaluation demonstrates FairSquare’s ability to verify fairness for a range of different programs, which we show are out-of-reach for state-of-the-art program analysis techniques.
Article
The problem of probabilistic modeling and inference, at a high-level, can be viewed as constructing a (model, query, inference) tuple, where an inference algorithm implements a query on a model. Notably, the derivation of inference algorithms can be a difficult and error-prone task. Hence, researchers have explored how ideas from probabilistic programming can be applied. In the context of constructing these tuples, probabilistic programming can be seen as taking a language-based approach to probabilistic modeling and inference. For instance, by using (1) appropriate languages for expressing models and queries and (2) devising inference techniques that operate on encodings of models (and queries) as program expressions, the task of inference can be automated. In this paper, we describe a compiler that transforms a probabilistic model written in a restricted modeling language and a query for posterior samples given observed data into a Markov Chain Monte Carlo (MCMC) inference algorithm that implements the query. The compiler uses a sequence of intermediate languages (ILs) that guide it in gradually and successively refining a declarative specification of a probabilistic model and the query into an executable MCMC inference algorithm. The compilation strategy produces composable MCMC algorithms for execution on a CPU or GPU.
Conference Paper
Probabilistic programming languages (PPLs) provide users a clean syntax for concisely representing probabilistic processes and easy access to sophisticated built-in inference algorithms. Unfortunately, writing a PPL program by hand can be difficult for non-experts, requiring extensive knowledge of statistics and deep insights into the data. To make the modeling process easier, we have created a tool that synthesizes PPL programs from relational datasets. Our synthesizer leverages the input data to generate a program sketch, then applies simulated annealing to complete the sketch. We introduce a data-guided approach to the program mutation stage of simulated annealing; this innovation allows our tool to scale to synthesizing complete probabilistic programs from scratch. We find that our synthesizer produces accurate programs from 10,000-row datasets in 21 s on average.
Conference Paper
The problem of probabilistic modeling and inference, at a high-level, can be viewed as constructing a (model, query, inference) tuple, where an inference algorithm implements a query on a model. Notably, the derivation of inference algorithms can be a difficult and error-prone task. Hence, researchers have explored how ideas from probabilistic programming can be applied. In the context of constructing these tuples, probabilistic programming can be seen as taking a language-based approach to probabilistic modeling and inference. For instance, by using (1) appropriate languages for expressing models and queries and (2) devising inference techniques that operate on encodings of models (and queries) as program expressions, the task of inference can be automated. In this paper, we describe a compiler that transforms a probabilistic model written in a restricted modeling language and a query for posterior samples given observed data into a Markov Chain Monte Carlo (MCMC) inference algorithm that implements the query. The compiler uses a sequence of intermediate languages (ILs) that guide it in gradually and successively refining a declarative specification of a probabilistic model and the query into an executable MCMC inference algorithm. The compilation strategy produces composable MCMC algorithms for execution on a CPU or GPU.
Article
Operators in multi-tenant cloud datacenters require support for diverse and complex end-to-end policies, such as, reachability, middlebox traversals, isolation, traffic engineering, and network resource management. We present Genesis, a datacenter network management system which allows policies to be specified in a declarative manner without explicitly programming the network data plane. Genesis tackles the problem of enforcing policies by synthesizing switch forwarding tables. It uses the formal foundations of constraint solving in combination with fast off-the-shelf SMT solvers. To improve synthesis performance, Genesis incorporates a novel search strategy that uses regular expressions to specify properties that leverage the structure of datacenter networks, and a divide-and-conquer synthesis procedure which exploits the structure of policy relationships. We have prototyped Genesis, and conducted experiments with a variety of workloads on real-world topologies to demonstrate its performance.
Conference Paper
Operators in multi-tenant cloud datacenters require support for diverse and complex end-to-end policies, such as, reachability, middlebox traversals, isolation, traffic engineering, and network resource management. We present Genesis, a datacenter network management system which allows policies to be specified in a declarative manner without explicitly programming the network data plane. Genesis tackles the problem of enforcing policies by synthesizing switch forwarding tables. It uses the formal foundations of constraint solving in combination with fast off-the-shelf SMT solvers. To improve synthesis performance, Genesis incorporates a novel search strategy that uses regular expressions to specify properties that leverage the structure of datacenter networks, and a divide-and-conquer synthesis procedure which exploits the structure of policy relationships. We have prototyped Genesis, and conducted experiments with a variety of workloads on real-world topologies to demonstrate its performance.
Conference Paper
ProbNetKAT is a probabilistic extension of NetKAT with a denotational semantics based on Markov kernels. The language is expressive enough to generate continuous distributions, which raises the question of how to compute effectively in the language. This paper gives an new characterization of ProbNetKAT’s semantics using domain theory, which provides the foundation needed to build a practical implementation. We show how to use the semantics to approximate the behavior of arbitrary ProbNetKAT programs using distributions with finite support. We develop a prototype implementation and show how to use it to solve a variety of problems including characterizing the expected congestion induced by different routing schemes and reasoning probabilistically about reachability in a network.
Conference Paper
Bayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes’s rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a continuous quantity usually has probability zero, in which case Bayes’s rule says only that the unknown times zero is zero. To infer a posterior distribution from a zero-probability observation, the statistical notion of disintegration tells us to specify the observation as an expression rather than a predicate, but does not tell us how to compute the posterior. We present the first method of computing a disintegration from a probabilistic program and an expression of a quantity to be observed, even when the observation has probability zero. Because the method produces an exact posterior term and preserves a semantics in which monadic terms denote measures, it composes with other inference methods in a modular way—without sacrificing accuracy or performance.
Conference Paper
We show that the Kolmogorov extension theorem and the Doob martingale convergence theorem are two aspects of a common generalization, namely a colimit-like construction in a category of Radon spaces and reversible Markov kernels. The construction provides a compositional denotational semantics for lossless iteration in probabilistic programming languages, even in the absence of a natural partial order.
Conference Paper
We present a theory for slicing probabilistic imperative programs—containing random assignment and “observe” statements—represented as control flow graphs whose nodes transform probability distributions. We show that such a representation allows direct adaptation of standard machinery such as data and control dependence, postdominators, relevant variables, etc. to the probabilistic setting. We separate the specification of slicing from its implementation: first we develop syntactic conditions that a slice must satisfy; next we prove that any such slice is semantically correct; finally we give an algorithm to compute the least slice. A key feature of our syntactic conditions is that they involve two disjoint slices such that the variables of one slice are probabilistically independent of the variables of the other. This leads directly to a proof of correctness of probabilistic slicing.
Conference Paper
This paper presents a new language for network programming based on a probabilistic semantics. We extend the NetKATlanguage with new primitives for expressing probabilistic behaviors and enrich the semantics from one based on deterministic functions to one based on measurable functions on sets of packet histories. We establish fundamental properties of the semantics, prove that it is a conservative extension of the deterministic semantics, show that it satisfies a number of natural equations, and develop a notion of approximation. We present case studies that show how the language can be used to model a diverse collection of scenarios drawn from real-world networks.
Conference Paper
Probabilistic inference is a key mechanism for reasoning about probabilistic programs. Since exact inference is theoretically expensive, most probabilistic inference systems today have adopted approximate inference techniques, which trade precision for better performance (but often without guarantees). As a result, while desirable for its ultimate precision, the practical effectiveness of exact inference for probabilistic programs is mostly unknown. This paper presents PSI (http:// www. psisolver. org), a novel symbolic analysis system for exact inference in probabilistic programs with both continuous and discrete random variables. PSI computes succinct symbolic representations of the joint posterior distribution represented by a given probabilistic program. PSI can compute answers to various posterior distribution, expectation and assertion queries using its own back-end for symbolic reasoning. Our evaluation shows that PSI is more effective than existing exact inference approaches: (i) it successfully computed a precise result for more programs, and (ii) simplified expressions that existing computer algebra systems (e.g., Mathematica, Maple) fail to andle.
Article
A probabilistic program defines a probability measure over its semantic structures. One common goal of probabilistic programming languages (PPLs) is to compute posterior probabilities for arbitrary models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines---even the compiled ones---incur significant runtime interpretation overhead, especially for contingent and open-universe models. This paper describes Swift, a compiler for the BLOG PPL. Swift-generated code incorporates optimizations that eliminate interpretation overhead, maintain dynamic dependencies efficiently, and handle memory management for possible worlds of varying sizes. Experiments comparing Swift with other PPL engines on a variety of inference problems demonstrate speedups ranging from 12x to 326x.
Conference Paper
Concurrency violations are an important source of bugs in Software-Defined Networks (SDN), often leading to policy or invariant violations. Unfortunately, concurrency violations are also notoriously difficult to avoid, detect and debug. This paper presents a novel approach and a tool, SDNRacer, for detecting concurrency violations of SDNs. Our approach is enabled by three key ingredients: (i) a precise happens- before model for SDNs that captures when events can happen concurrently; (ii) a set of sound, domain-specific filters that reduce reported violations by orders of magnitude, and; (iii) a sound and complete dynamic analyzer, based on the above, that can ensure the network is free of harmful errors such as data races and per-packet incoherence. We evaluated SDNRacer on several real-world OpenFlow controllers, running both reactive and proactive applications in large networks. We show that SDNRacer is practically effective: it quickly pinpoints harmful concurrency violations without overwhelming the user with false positives.
Conference Paper
In this paper we investigate the applicability of standard model checking approaches to verifying properties in probabilistic programming. As the operational model for a standard probabilistic program is a potentially infinite parametric Markov decision process, no direct adaption of existing techniques is possible. Therefore, we propose an on-the-fly approach where the operational model is successively created and verified via a step-wise execution of the program. This approach enables to take key features of many probabilistic programs into account: nondeterminism and conditioning. We discuss the restrictions and demonstrate the scalability on several benchmarks.
Article
Implementing inference procedures for each new probabilistic model is time-consuming and error-prone. Probabilistic programming addresses this problem by allowing a user to specify the model and then automatically generating the inference procedure. To make this practical it is important to generate high performance inference code. In turn, on modern architectures, high performance requires parallel execution. In this paper we present Augur, a probabilistic modeling language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs. We show that the compiler can generate data-parallel inference code scalable to thousands of GPU cores by making use of the conditional independence relationships in the Bayesian network.
Article
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users’ and developers’ perspectives and illustrate with a simple but nontrivial nonlinear regression example.
Article
We introduce and demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. Our approach is simple to implement and easy to parallelize. It applies to Turing-complete probabilistic programming languages and supports accurate inference in models that make use of complex control flow, including stochastic recursion. It also includes primitives from Bayesian nonparametric statistics. Our experiments show that this approach can be more efficient than previously introduced single-site Metropolis-Hastings methods.
Conference Paper
Recent years have seen growing interest in high-level languages for programming networks. But the design of these languages has been largely ad hoc, driven more by the needs of applications and the capabilities of network hardware than by foundational principles. The lack of a semantic foundation has left language designers with little guidance in determining how to incorporate new features, and programmers without a means to reason precisely about their code. This paper presents NetKAT, a new network programming language that is based on a solid mathematical foundation and comes equipped with a sound and complete equational theory. We describe the design of NetKAT, including primitives for filtering, modifying, and transmitting packets; union and sequential composition operators; and a Kleene star operator that iterates programs. We show that NetKAT is an instance of a canonical and well-studied mathematical structure called a Kleene algebra with tests (KAT) and prove that its equational theory is sound and complete with respect to its denotational semantics. Finally, we present practical applications of the equational theory including syntactic techniques for checking reachability, proving non-interference properties that ensure isolation between programs, and establishing the correctness of compilation algorithms.
Conference Paper
We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyber-physical systems. Correctness properties of such programs take the form of queries that seek the probabilities of assertions over program variables. We present a static analysis approach that provides guaranteed interval bounds on the values (assertion probabilities) of such queries. First, we observe that for probabilistic programs, it is possible to conclude facts about the behavior of the entire program by choosing a finite, adequate set of its paths. We provide strategies for choosing such a set of paths and verifying its adequacy. The queries are evaluated over each path by a combination of symbolic execution and probabilistic volume-bound computations. Each path yields interval bounds that can be summed up with a "coverage" bound to yield an interval that encloses the probability of assertion for the program as a whole. We demonstrate promising results on a suite of benchmarks from many different sources including robotic manipulators and medical decision making programs.
Article
Probabilistic programs use familiar notation of programming languages to specify probabilistic models. Suppose we are interested in estimating the distribution of the return expression r of a probabilistic program P. We are interested in slicing the probabilistic program P and obtaining a simpler program Sli(P) which retains only those parts of P that are relevant to estimating r, and elides those parts of P that are not relevant to estimating r. We desire that the Sli transformation be both correct and efficient. By correct, we mean that P and Sli(P) have identical estimates on r. By efficient, we mean that estimation over Sli(P) be as fast as possible. We show that the usual notion of program slicing, which traverses control and data dependencies backward from the return expression r, is unsatisfactory for probabilistic programs, since it produces incorrect slices on some programs and sub-optimal ones on others. Our key insight is that in addition to the usual notions of control dependence and data dependence that are used to slice non-probabilistic programs, a new kind of dependence called observe dependence arises naturally due to observe statements in probabilistic programs. We propose a new definition of Sli(P) which is both correct and efficient for probabilistic programs, by including observe dependence in addition to control and data dependences for computing slices. We prove correctness mathematically, and we demonstrate efficiency empirically. We show that by applying the Sli transformation as a pre-pass, we can improve the efficiency of probabilistic inference, not only in our own inference tool R2, but also in other systems for performing inference such as Church and Infer.NET.
Article
Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations. Models from diverse application areas such as computer vision, coding theory, cryptographic protocols, biology and reliability analysis can be written as probabilistic programs. Probabilistic inference is the problem of computing an explicit representation of the probability distribution implicitly specified by a probabilistic program. Depending on the application, the desired output from inference may vary---we may want to estimate the expected value of some function f with respect to the distribution, or the mode of the distribution, or simply a set of samples drawn from the distribution. In this paper, we describe connections this research area called ``Probabilistic Programming" has with programming languages and software engineering, and this includes language design, and the static and dynamic analysis of programs. We survey current state of the art and speculate on promising directions for future research.
Conference Paper
The probability density function of a probability distribution is a fundamental concept in probability theory and a key ingredient in various widely used machine learning methods. However, the necessary framework for compiling probabilistic functional programs to density functions has only recently been developed. In this work, we present a density compiler for a probabilistic language with discrete and continuous distributions, and discrete observations, and provide a proof of its soundness. The compiler greatly reduces the development effort of domain experts, which we demonstrate by solving inference problems from various scientific applications, such as modelling the global carbon cycle, using a standard Markov chain Monte Carlo framework.
Conference Paper
Today's networks typically carry or deploy dozens of protocols and mechanisms simultaneously such as MPLS, NAT, ACLs and route redistribution. Even when individual protocols function correctly, failures can arise from the complex interactions of their aggregate, requiring network administrators to be masters of detail. Our goal is to automatically find an important class of failures, regardless of the protocols running, for both operational and experimental networks. To this end we developed a general and protocol-agnostic framework, called Header Space Analysis (HSA). Our formalism allows us to statically check network specifications and configurations to identify an important class of failures such as Reachability Failures, Forwarding Loops and Traffic Isolation and Leakage problems. In HSA, protocol header fields are not first class entities; instead we look at the entire packet header as a concatenation of bits without any associated meaning. Each packet is a point in the {0,1}L space where L is the maximum length of a packet header, and networking boxes transform packets from one point in the space to another point or set of points (multicast). We created a library of tools, called Hassel, to implement our framework, and used it to analyze a variety of networks and protocols. Hassel was used to analyze the Stanford University backbone network, and found all the forwarding loops in less than 10 minutes, and verified reachability constraints between two subnets in 13 seconds. It also found a large and complex loop in an experimental loose source routing protocol in 4 minutes.
Conference Paper
We present a new algorithm for Bayesian inference over probabilistic programs, based on data flow analysis techniques from the program analysis community. Unlike existing techniques for Bayesian inference on probabilistic programs, our data flow analysis algorithm is able to perform inference directly on probabilistic programs with loops. Even for loop-free programs, we show that data flow analysis offers better precision and better performance benefits over existing techniques. We also describe heuristics that are crucial for our inference to scale, and present an empirical evaluation of our algorithm over a range of benchmarks.
Conference Paper
Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces UncertainT>, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that UncertainT> improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, UncertainT> serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The UncertainT> type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make UncertainT> a compelling programming model for modern applications facing the challenge of uncertainty.
Conference Paper
Software reliability analysis tackles the problem of predicting the failure probability of software. Most of the current approaches base reliability analysis on architectural abstractions useful at early stages of design, but not directly applicable to source code. In this paper we propose a general methodology that exploit symbolic execution of source code for extracting failure and success paths to be used for probabilistic reliability assessment against relevant usage scenarios. Under the assumption of finite and countable input domains, we provide an efficient implementation based on Symbolic PathFinder that supports the analysis of sequential and parallel programs, even with structured data types, at the desired level of confidence. The tool has been validated on both NASA prototypes and other test cases showing a promising applicability scope.
Article
Networks are complex and prone to bugs. Existing tools that check configuration files and data-plane state operate offline at timescales of seconds to hours, and cannot detect or prevent bugs as they arise. Is it possible to check network-wide invariants in real time, as the network state evolves? The key challenge here is to achieve extremely low latency during the checks so that network performance is not affected. In this paper, we present a preliminary design, VeriFlow, which suggests that this goal is achievable. VeriFlow is a layer between a software-defined networking controller and network devices that checks for network-wide invariant violations dynamically as each forwarding rule is inserted. Based on an implementation using a Mininet OpenFlow network and Route Views trace data, we find that VeriFlow can perform rigorous checking within hundreds of microseconds per rule insertion.
Article
This memo defines InMon Coporation's sFlow system. sFlow is a technology for monitoring traffic in data networks containing switches and routers. In particular, it defines the sampling mechanisms implemented in an sFlow Agent for monitoring traffic, the sFlow MIB for controlling the sFlow Agent, and the format of sample data used by the sFlow Agent when forwarding data to a central data collector.
Article
Equal-cost multi-path (ECMP) is a routing technique for routing packets along multiple paths of equal cost. The forwarding engine identifies paths by next-hop. When forwarding a packet the router must decide which next-hop (path) to use. This document gives an analysis of one method for making that decision. The analysis includes the performance of the algorithm and the disruption caused by changes to the set of next-hops. 1. Hash-Threshold One method for determining which next-hop to use when routing with ECMP can be called hash-threshold. The router first selects a key by performing a hash (e.g., CRC16) over the packet header fields that identify a flow. The N next-hops have been assigned unique regions in the key space. The router uses the key to determine which region and thus which next-hop to use. As an example of hash-threshold, upon receiving a packet the router performs a CRC16 on the packet's header fields that define the flow (e.g., the source and destination fields of the packet), this is the key. Say for this destination there are 4 next-hops to choose from. Each next-hop is assigned a region in 16 bit space (the key space). For equal usage the router may have chosen to divide it up evenly so each region is 65536/4 or 16k large. The next-hop is chosen by determining which region contains the key (i.e., the CRC result).
Article
We develop diverse routing schemes for dealing with multiple, possibly correlated, failures. While disjoint path protection can effectively deal with isolated single link failures, recovering from multiple failures is not guaranteed. In particular, events such as natural disasters or intentional attacks can lead to multiple correlated failures, for which recovery mechanisms are not well understood. We take a probabilistic view of network failures where multiple failure events can occur simultaneously, and develop algorithms for finding diverse routes with minimum joint failure probability. Moreover, we develop a novel Probabilistic Shared Risk Link Group (PSRLG) framework for modeling correlated failures. In this context, we formulate the problem of finding two paths with minimum joint failure probability as an integer nonlinear program (INLP) and develop approximations and linear relaxations that can find nearly optimal solutions in most cases.
Article
The Internet and computer networks are exposed to an increasing number of security threats. With new types of attacks appearing continually, developing flexible and adaptive security oriented approaches is a severe challenge. In this context, anomaly-based network intrusion detection techniques are a valuable technology to protect target systems and networks against malicious activities. However, despite the variety of such methods described in the literature in recent years, security tools incorporating anomaly detection functionalities are just starting to appear, and several important problems remain to be solved. This paper begins with a review of the most well-known anomaly-based intrusion detection techniques. Then, available platforms, systems under development and research projects in the area are presented. Finally, we outline the main challenges to be dealt with for the wide scale deployment of anomaly-based intrusion detectors, with special emphasis on assessment issues.