ArticlePDF Available

Programming Protocol-Independent Packet Processors

Authors:

Abstract and Figures

OpenFlow is a vendor-agnostic API for controlling hardware and software switches. In its current form, OpenFlow is specific to particular protocols, making it hard to add new protocol headers. It is also tied to a specific processing paradigm. In this paper we make a strawman proposal for how OpenFlow should evolve in the future, starting with the definition of an abstract forwarding model for switches. We have three goals: (1) Protocol independence: Switches should not be tied to any specific network protocols. (2) Target independence: Programmers should describe how switches are to process packets in a way that can be compiled down to any target switch that fits our abstract forwarding model. (3) Reconfigurability in the field: Programmers should be able to change the way switches process packets once they are deployed in a network. We describe how to write programs using our abstract forwarding model and our P4 programming language in order to configure switches and populate their forwarding tables.
Content may be subject to copyright.
A preview of the PDF is not available
... Siber güvenliğin artırılması için en temelde yüksek hız ve bant genişliği sunan dinamik güvenlik fonksiyonlarına ihtiyaç duyulmaktadır. Bu ihtiyaçların karşılanması için ağ yazılımlaştırma yaklaşımında P4 [2] gibi alan özel programlama dilleri ile ağ cihazları programlanmakta ve ağ cihazlarının donanımla sundukları hızlardan faydalanılabilmektedir. Bu kitap bölümünün ikinci kısmında ağ yazılımlaştırma ve teknolojileri hakkında özet bilgi sunulmaktadır. ...
... P4 programlama dili, paketlerin veri katmanında nasıl işleneceğini tanımlamak için kullanılan alana özel üst seviye bir programlama dilidir [2], [12]. P4 ismi protokol bağımsız paket işlemcilerinin programlanması anlamına gelen İngilizce "Programming Protocol-independent Packet Processors" ifadesindeki kelimelerin ilk harflerinden türetilmiştir. ...
Chapter
Full-text available
This chapter summarizes SDN (software defined networks) and NFV (network function virtualization) technologies that make up the network softwarization, comparison of programming features supported by P4 data plane programming language and OpenFlow protocol, and the capabilities of P4 to enable enhanced network level cyber security. Moreover, P4-based cyber security solutions in Literature are reviewed, and challenges and potential research questions are discussed. Bu bölümde; ağ yazılımlaştırma yaklaşımını oluşturan yazılım tanımlı ağlar ve ağ fonksiyonu sanallaştırma teknolojileri, OpenFlow protokolünün programlama için sunduğu özellikler, veri katmanının programlanabilmesi için kullanılan P4 programlama dilinin kabiliyetleri, P4 programlama dili özelliklerinin siber güvenliğe sağladığı katkılar özetlenmektedir. Sonrasında da veri katmanı programlama ile geliştirilebilen siber güvenlik çözümleri ve açıklanarak bu alanda yapılabilecek çalışmalar değerlendirilmiştir.
... In addition, MATReduce [30] merges duplicate match operations between different P4 [31] match-action tables to accelerate the packet processing pipeline of P4 switches. However, MATReduce only targets P4 switches so it lacks necessary factors for SFC implementation, such as maintaining NF dependencies. ...
Article
Full-text available
Service function chains (SFCs) built by network functions (NFs) are usually required to be low‐latency, especially in some tight latency scenarios such as Distributed Denial of Service (DDoS) defense. However, different NFs in SFCs are likely to perform identical operations of matching specific packet header fields (e.g., source IP address). Such a processing redundancy inevitably increases the overall end‐to‐end processing latency of SFCs, which in turn affects network management applications in their decision‐making process of handling short‐lived network events (e.g., microbursts). To address this problem, in this paper, we propose a novel NFV framework, DMO, that aims to eliminate those identical and redundant match operations among different NFs in input SFCs while maintaining the original SFC semantics of packet processing. To achieve this goal, our design proposes a semantic‐preserving mechanism that merges duplicate match operations between different NFs and also preserves original SFC semantics. Also, to avoid potential conflicts among NF rules at runtime, DMO further offers another mechanism that resolves unnecessary conflicts among different NF rules by assigning reasonable priorities to these rules. We have implemented a prototype of DMO. Our experiments indicate that DMO achieves 26.7%–67.3% latency reduction for real‐world SFCs.
... On the contrary, at L3 we must rely on DPI and port On the performance side, it is common to implement L3 network functions in ASICs, which yields high performance implementations, while most L7 networking functions are implemented in software. This situation may change due to the introduction of programmable switches based on P4 [26]. Finally, a L7 approach can process all kinds of encrypted traffic 1 , while L3 is limited to IPsec traffic and similar protocols. ...
Preprint
In the last 15 years, the Internet architecture has continued evolving organically, introducing new headers and protocols to the classic TCP/IP stack. More specifically, we have identified two major trends. First, it is common that most communications are encrypted, either at L3 or L4. And second, due to protocol ossification, developers have resorted to upper layers to introduce new functionalities (L4 and above). For example, QUIC's connection migration feature provides mobility at L4. In this paper we present a reflection around these changes, and attempt to formalize them by adding two additional protocol headers to the TCP/IP stack: one for security, and another for new functionalities. We must note that we are not presenting a new architecture, but trying to draw up what it's already out there. In addition, we elaborate on the forces that have brought us here, and we enumerate current proposals that are shaping these new headers. We also analyze in detail three examples of such trends: the Zero Trust Networking paradigm, the QUIC transport protocol, and modern SD-WAN systems. Finally, we present a formalization of this architecture by adding these two additional layers to the TCP/IP protocol stack. Our goal is triggering a discussion on the changes of the current Internet architecture.
... Intel has developed the IPDK [3] framework which bundles container with multiple existing software packages like Open Virtual Switch (OVS) [4] that can be programmed using the P4 programming language [5]. Nvidia's DOCA [6] package also bundles a rich set of software packages ranging from highlevel Snort [7], OVS, and NVMe virtualization to low-level abstraction like rdma-core [8] and DPDK [9]. ...
Preprint
Full-text available
In this paper, we present a framework for moving compute and data between processing elements in a distributed heterogeneous system. The implementation of the framework is based on the LLVM compiler toolchain combined with the UCX communication framework. The framework can generate binary machine code or LLVM bitcode for multiple CPU architectures and move the code to remote machines while dynamically optimizing and linking the code on the target platform. The remotely injected code can recursively propagate itself to other remote machines or generate new code. The goal of this paper is threefold: (a) to present an architecture and implementation of the framework that provides essential infrastructure to program a new class of disaggregated systems wherein heterogeneous programming elements such as compute nodes and data processing units (DPUs) are distributed across the system, (b) to demonstrate how the framework can be integrated with modern, high-level programming languages such as Julia, and (c) to demonstrate and evaluate a new class of eXtended Remote Direct Memory Access (X-RDMA) communication operations that are enabled by this framework. To evaluate the capabilities of the framework, we used a cluster with Fujitsu CPUs and heterogeneous cluster with Intel CPUs and BlueField-2 DPUs interconnected using high-performance RDMA fabric. We demonstrated an X-RDMA pointer chase application that outperforms an RDMA GET-based implementation by 70% and is as fast as Active Messages, but does not require function predeployment on remote platforms.
Article
Communication networks require high availability to provide reliable services to users. One of techniques maintaining high availability of communication networks is fast failure recovery such as the multiple routing configurations (MRC) algorithm. In MRC, multiple virtual networks named backup routing configurations for transmitting data after a single link/node failure occurrence are constructed on the basis of a normal routing configuration on a physical network. When a failure occurs during data transmission on the physical network, MRC immediately switches the normal routing configuration to appropriate backup routing configurations to ensure data transmission without significant delay and packet loss. In this paper, we introduce a design of a fast failure recovery mechanism using MRC for software-defined networks with Programming Protocol-independent Packet Processors (P4). P4 is a programming language that enables us to define the behavior of the data plane of network devices in software-defined networks. We provide the implementation of some functions for fast failure recovery, including failure detection and packet forwarding. We verify the behavior of our implementation through practical demonstrations using Mininet.
Article
Software-defined wide area networking (SD-WAN) enables dynamic network policy control over a large distributed network via network updates . To be practical, network updates must be consistent (i.e., free of transient errors caused by updates to multiple switches), secure (i.e., only be executed when sent from valid controllers), and reliable (i.e., function despite the presence of faulty or malicious members in the control plane), while imposing only minimal overhead on controllers and switches. We present SERENE: a protocol for se cure and re liable ne twork updates for SD-WAN environments. In short: Consistency is provided through the combination of an update scheduler and a distributed transactional protocol. Security is preserved by authenticating network events and updates, the latter with an adaptive threshold cryptographic scheme. Reliability is provided by replicating the control plane and making it resilient to a dynamic adversary by using a distributed ledger as a controller failure detector. We ensure practicality by providing a mechanism for scalability through the definition of independent network domains and exploiting parallelism of network updates both within and across domains. We formally define SERENE’s protocol and prove its safety with regards to event-linearizability. Extensive experiments show that SERENE imposes minimal switch burden and scales to large networks running multiple network applications all requiring concurrent network updates, imposing at worst a 16% overhead on short-lived flow completion and negligible overhead on anticipated normal workloads.
Article
Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.
Article
Botnet-originated DDoS attacks continue to plague the internet and disrupt services for legitimate users. While various proposals have been presented in the last two decades, the botnet still has advantages over the defenders, because botnets have orchestrated processes to launch disruptive attacks. On the other hand, the defenders use manual methods, siloed tools, and lack orchestration among different organizations. These unorchestrated efforts slow down the attack response and extend the lifespan of botnet attacks. This article presents shieldSDN and shieldCHAIN, an inter-organization collaborative defense framework using P4, SDN, and Blockchain, which extends our earlier research on microVNF, a solution of Edge security for SIP- enabled IoT devices with P4. Besides mitigating DDoS attacks, microVNF also produces attack fingerprints called Indicator of Compromise (IOC) records. ShieldSDN and shieldCHAIN dis- tribute these IOCs to other organizations so that they can create their own packet filters. Effectively, shieldSDN and shieldCHAIN synchronize packet filters for different organizations to mitigate against the same botnet strain. Four experiments were performed successfully to validate the functionalities of shieldSDN and shieldCHAIN. The scope for the first experiment was intra- company, while the second, third, and fourth experiments were inter-company. In the first experiment, shieldSDN extracted IOCs from the source switch and installed these as packet filters on other switches within the same organization (in the U.S.). In the second experiment, the shieldCHAIN in the publishing organization (in the U.S.) shared IOCs by posting them to the Blockchain. In the third experiment, the shieldCHAIN in the subscriber organizations (in Singapore & the U.K.) retrieved these IOCs from Blockchain. Finally, in the last experiment, the shieldCHAIN in the subscriber organizations installed the retrieved IOCs as packet filters; that are identical to those in the originating organization. To the best of our knowledge, this is the first framework that uses the P4 switch, SDN controller, and Blockchain together for this use case. As SDN and Blockchain gain acceptance, this framework empowers community members to collaborate and defend against botnet DDoS attacks.
Article
The emergence of Network Functions Virtualization (NFV) is being heralded as an enabler of the recent technologies such as 5G/6G, IoT and heterogeneous networks. Existing NFV monitoring frameworks either do not have the capabilities to express the range of telemetry items needed to perform management or do not scale to large traffic volumes and rates. We present IntOpt, a scalable and expressive telemetry system designed for flexible NFV monitoring using active probing and P4. IntOpt allows us to specify monitoring requirements for individual service chain, which are mapped to telemetry item collection jobs that fetch the required telemetry items from P4 programmable data-plane elements. We propose mixed integer linear program (MILP) as well as a simulated annealing based random greedy (SARG) meta-heuristic approach to minimize the overhead due to active probing and collection of telemetry items. Using P4-FPGA, we benchmark the overhead for telemetry collection. Our numerical evaluation shows that the proposed approach can reduce monitoring overheads by 39% and monitoring delays by 57%. Such optimization may as well enable existing expressive monitoring frameworks to scale for larger real-time networks.
Conference Paper
Full-text available
A flexible and programmable forwarding plane is essential to maximize the value of Software-Defined Networks (SDN). In this paper, we propose Protocol-Oblivious Forwarding (POF) as a key enabler for highly flexible and programmable SDN. Our goal is to remove any dependency on protocol-specific configurations on the forwarding elements and enhance the data-path with new stateful instructions to support genuine software defined networking behavior. A generic flow instruction set (FIS) is defined to fulfill this purpose. POF helps to lower network cost by using commodity forwarding elements and to create new value by enabling numerous innovative network services. We built both hardware-based and open source software-based prototypes to demonstrate the feasibility and advantages of POF. We report the preliminary evaluation results and the insights we learnt from the experiments. POF is future-proof and expressive. We believe it represents a promising direction to evolve the OpenFlow protocol and the future SDN forwarding elements.
Article
PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as "living" documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.
Conference Paper
In Software Defined Networking (SDN) the control plane is physically separate from the forwarding plane. Control software programs the forwarding plane (e.g., switches and routers) using an open interface, such as OpenFlow. This paper aims to overcomes two limitations in current switching chips and the OpenFlow protocol: i) current hardware switches are quite rigid, allowing ``Match-Action'' processing on only a fixed set of fields, and ii) the OpenFlow specification only defines a limited repertoire of packet processing actions. We propose the RMT (reconfigurable match tables) model, a new RISC-inspired pipelined architecture for switching chips, and we identify the essential minimal set of action primitives to specify how headers are processed in hardware. RMT allows the forwarding plane to be changed in the field without modifying hardware. As in OpenFlow, the programmer can specify multiple match tables of arbitrary width and depth, subject only to an overall resource limit, with each table configurable for matching on arbitrary fields. However, RMT allows the programmer to modify all header fields much more comprehensively than in OpenFlow. Our paper describes the design of a 64 port by 10 Gb/s switch chip implementing the RMT model. Our concrete design demonstrates, contrary to concerns within the community, that flexible OpenFlow hardware switch implementations are feasible at almost no additional cost or power.
Conference Paper
Several emerging network trends and new architectural ideas are placing increasing demand on forwarding table sizes. From massive-scale datacenter networks running millions of virtual machines to flow-based software-defined networking, many intriguing design options require FIBs that can scale well beyond the thousands or tens of thousands possible using today's commodity switching chips. This paper presents CuckooSwitch, a software-based Ethernet switch design built around a memory-efficient, high-performance, and highly-concurrent hash table for compact and fast FIB lookup. We show that CuckooSwitch can process 92.22 million minimum-sized packets per second on a commodity server equipped with eight 10 Gbps Ethernet interfaces while maintaining a forwarding table of one billion forwarding entries. This rate is the maximum packets per second achievable across the underlying hardware's PCI buses.
Conference Paper
Networking researchers and practitioners strive for a greater degree of control and programmability to rapidly innovate in production networks. While this desire enjoys commercial success in the control plane through efforts such as OpenFlow, the dataplane has eluded such programmability. In this paper, we show how end-hosts can coordinate with the network to implement a wide-range of network tasks, by embedding tiny programs into packets that execute directly in the dataplane. Our key contribution is a programmatic interface between end-hosts and the switch ASICs that does not sacrifice raw performance. This interface allows network tasks to be refactored into two components: (a) a simple program that executes on the ASIC, and (b) an expressive task distributed across end-hosts. We demonstrate the promise of this approach by implementing three tasks using read/write programs: (i) detecting short-lived congestion events in high speed networks, (ii) a rate-based congestion control algorithm, and (iii) a forwarding plane network debugger.
Conference Paper
The data plane is in a continuous state of flux. Every few months, researchers publish the design of a new high-performance queueing or scheduling scheme that runs inside the network fabric. Many such schemes have been queen for a day, only to be surpassed soon after as methods --- or evaluation metrics --- evolve. The lesson, in our view: there will never be a conclusive victor to govern queue management and scheduling inside network hardware. We provide quantitative evidence by demonstrating bidirectional cyclic preferences among three popular contemporary AQM and scheduling configurations. We argue that the way forward requires carefully extending Software-Defined Networking to control the fast-path scheduling and queueing behavior of a switch. To this end, we propose adding a small FPGA to switches. We have synthesized, placed, and routed hardware implementations of CoDel and RED. These schemes require only a few thousand FPGA "slices" to run at 10 Gbps or more --- a minuscule fraction of current low-end FPGAs --- demonstrating the feasibility and economy of our approach.
Article
In spite of the standardization of the OpenFlow API, it is very difficult to write an SDN controller application that is portable (i.e., guarantees correct packet processing over a wide range of switches) and achieves good performance (i.e., fully leverages switch capabilities). This is because the switch landscape is fundamentally diverse in performance, feature set and supported APIs. We propose to address this challenge via a lightweight portability layer that acts as a rendezvous point between the requirements of controller application and the vendor knowledge of switch implementations. Above, applications specify rules in virtual flow tables annotated with semantic intents and expectations. Below, vendor specific drivers map them to optimized switch-specific rule sets. NOSIX represents a first step towards achieving both portability and good performance across a diverse set of switches.
Conference Paper
All network devices must parse packet headers to decide how packets should be processed. A 64 × 10Gb/s Ethernet switch must parse one billion packets per second to extract fields used in forwarding decisions. Although a necessary part of all switch hardware, very little has been written on parser design and the trade-offs between different designs. Is it better to design one fast parser, or several slow parsers? What is the cost of making the parser reconfigurable in the field? What design decisions most impact power and area? In this paper, we describe trade-offs in parser design, identify design principles for switch and router designers, and describe a parser generator that outputs synthesizable Verilog that is available for download. We show that i) packet parsers today occupy about 1–2% of the chip, and ii) while future packet parsers will need to be programmable, this only doubles the (already small) area needed.
Article
In writing networking code, one is often faced with the task of interpreting a raw buffer according to a standardized packet format. This is needed, for example, when monitoring network traffic for specific kinds of packets, or when unmarshaling an incoming packet for protocol processing. In such cases, a programmer typically writes C code that understands the grammar of a packet and that also performs any necessary byte-order and alignment adjustments. Because of the complexity of certain protocol formats, and because of the low-level of programming involved, writing such code is usually a cumbersome and error-prone process. Furthermore, code written in this style loses the domain-specific information, viz. the packet format, in its details, making it difficult to maintain.
Conference Paper
More fundamental than IP lookups and packet classification in routers is the extraction of fields such as IP Dest and TCP Ports that determine packet forwarding. While parsing of packet fields used to be easy, new shim layers (e.g., MPLS, 802.1Q, MAC-in-MAC) of possibly variable length have greatly increased the worst-case path in the parse tree. The problem is exacerbated by the need to accommodate new packet headers and to extract other higher layer fields. Programmable routers for projects such as GENI will need such flexible parsers. In this paper, we describe the design and implementation of the Kangaroo system, a flexible packet parser that can run at 40 Gbps even for worst-case packet headers. Because conventional solutions that traverse the parse tree one protocol at a time are too slow, Kangaroo uses lookahead to parse several protocol headers in one step using a new architecture in which a CAM directs the next set of bytes to be extracted. The challenge is to keep the number of CAM entries from growing exponentially with the amount of lookahead. We deal with this challenge using a non-uniform traversal of the parse tree, and an offline dynamic programming algorithm that calculates the optimal walk. Our experiments on a NetFPGA prototype show a speedup of 2 compared to an architecture with a lookahead of 1. The architecture can be implemented as a parsing block in a standard 400 MHz ASIC at 40 Gbps using less than 1% of chip area.