ArticlePDF Available

An Exhaustive Survey on P4 Programmable Data Plane Switches: Taxonomy, Applications, Challenges, and Future Trends

Authors:

Abstract and Figures

Traditionally, the data plane has been designed with fixed functions to forward packets using a small set of protocols. This closed-design paradigm has limited the capability of the switches to proprietary implementations which are hard-coded by vendors, inducing a lengthy, costly, and inflexible process. Recently, data plane programmability has attracted significant attention from both the research community and the industry, permitting operators and programmers in general to run customized packet processing functions. This open-design paradigm is paving the way for an unprecedented wave of innovation and experimentation by reducing the time of designing, testing, and adopting new protocols; enabling a customized, top-down approach to develop network applications; providing granular visibility of packet events defined by the programmer; reducing complexity and enhancing resource utilization of the programmable switches; and drastically improving the performance of applications that are offloaded to the data plane. Despite the impressive advantages of programmable data plane switches and their importance in modern networks, the literature has been missing a comprehensive survey. To this end, this paper provides a background encompassing an overview of the evolution of networks from legacy to programmable, describing the essentials of programmable switches, and summarizing their advantages over Software-defined Networking (SDN) and legacy devices. The paper then presents a unique, comprehensive taxonomy of applications developed with P4 language; surveying, classifying, and analyzing more than 200 articles; discussing challenges and considerations; and presenting future perspectives and open research issues.
Content may be subject to copyright.
A preview of the PDF is not available
... Kfoury et al. [21] presented an exhaustive survey related to P4 programmable data planes. It includes a general overview of the P4 language and its applications. ...
... However, such schemes have a main limitation; they assume that their methods will be implemented on contemporary routers. Such process is length and costly and most likely, router manufacturers will not modify their existing devices [21]. In fact, such schemes and other Active Queue Management (AQM) algorithms have been proposed more than ten years ago, and are still not implemented on contemporary routers in the market today, though the problem of buffer sizing is still being thoroughly researched [5]. ...
Article
Full-text available
The increasing performance requirements of today’s Internet applications demand a reliable mechanism to transfer data. Many applications rely on the Transmission Control Protocol (TCP) as the transport protocol, due to its ability to adapt to properties of the network and to be robust in the face of many kinds of failures. However, improving the performance of applications that rely on TCP has been limited by the closed nature of legacy switches, which do not provide accurate visibility of network events. With the emergence of P4-programmable devices, developers can rapidly implement and test customized solutions that use fine-grained telemetry, provide sub round-trip time feedback to end devices to enhance congestion control, precisely isolate traffic to offer better Quality of Service (QoS), quickly detect congestion and re-route traffic via alternate paths, and optimize server resources by offloading protocols. This paper first surveys recent works on P4-programmable devices, focusing on schemes aimed at enhancing TCP performance. It provides a taxonomy classifying the aspects that impact TCP’s behavior, such as congestion control, Active Queue Management (AQM) algorithms, TCP offloading, and network measurement schemes. Then, it compares the P4-based solutions, and contrasts those solutions with legacy implementations. Lastly, the paper presents challenges and future trends.
... Estimating how often items are appearing in the data is a fundamental task in many applications. Frequency estimation has several applications in networking [2], web search analysis and databases [3], signal processing and machine learning [4], among many other fields. Items of interest in a data stream could represent popular search queries in a server, frequently visited websites in network traffic, best-selling items in retail data, most active stocks in financial data, etc. ...
Article
Full-text available
This paper presents simple yet effective optimizations for implementing data stream frequency estimation sketch kernels using High-Level Synthesis (HLS). The paper addresses design issues common to sketches utilizing large portions of the embedded RAM resources in a Field Programmable Gate Array (FPGA). First, a solution based on Load-Store Queue (LSQ) architecture is proposed for resolving the memory dependencies associated with the hash tables in a frequency estimation sketch. Second, performance fine-tuning through high-level pragmas is explored to achieve the best possible throughput. Finally, a technique based on pre-processing the data stream in a small cache memory prior to updating the sketch is evaluated to reduce the dynamic power consumption. Using an Intel HLS compiler, a proposed optimized hardware version of the popular Count-Min sketch utilizing 80% of the embedded RAM in an Intel Arria 10 FPGA, achieved more than 3x the throughput of an unoptimized baseline implementation. Furthermore, the sketch update rate is significantly reduced when the input stream is skewed. This, in turn, minimizes the effect of high throughput on dynamic power consumption. Compared to FPGA sketches in the published literature, the presented sketch is the most well-rounded sketch in terms of features and versatility. In terms of throughput, the presented sketch is on a par with the fastest sketches fine-tuned at the Register Transfer Level (RTL).
... The recent emergence of Protocol Independent Switch Architecture based programmable switches (commonly referred to as the PISA switch [6]) along with P4 programming language [7,8], has enabled programmability in the data plane. These computationally rich PISA switches [9,10] can provide dynamic decisionmaking capability to overcome the limitations of traditional fixed-function switch-based traffic engineering schemes. ...
Preprint
Full-text available
This work presents P4TE, an in-band traffic monitoring, load-aware packet forwarding, and flow rate controlling mechanism for traffic engineering in fat-tree topology-based data center networks using PISA switches. It achieves sub-RTT reaction time to change in network conditions, improved flow completion time, and balanced link utilization. Unlike the classical probe-based monitoring approach, P4TE uses an in-band monitoring approach to identify traffic events in the data plane. Based on these events, it re-adjusts the priorities of the paths. It uses a heuristic-based load-aware forwarding path selection mechanism to respond to changing network conditions and control the flow rate by sending feedback to the end hosts. It is implementable on emerging v1model.p4 architecture-based programmable switches and capable of maintaining the line-rate performance. Our evaluation shows that P4TE uses a small amount of resources in the PISA pipeline and achieves an improved flow completion time than ECMP and HULA.
... The recent emergence of Protocol Independent Switch Architecture based programmable switches (commonly referred to as the PISA switch [6]) along with P4 programming language [7,8], has enabled programmability in the data plane. These computationally rich PISA switches [9,10] can provide dynamic decisionmaking capability to overcome the limitations of traditional fixed-function switch-based traffic engineering schemes. ...
Article
Full-text available
This work presents P4TE, an in-band traffic monitoring, load-aware packet forwarding, and flow rate controlling mechanism for traffic engineering in fat-tree topology-based data center networks using PISA switches. It achieves sub-RTT reaction time to change in network conditions, improved flow completion time, and balanced link utilization. Unlike the classical probe-based monitoring approach, P4TE uses an in-band monitoring approach to identify traffic events in the data plane. Based on these events, it readjusts the priorities of the paths. It uses a heuristic-based load-aware forwarding path selection mechanism to respond to changing network conditions and control the flow rate by sending feedback to the end hosts. It is implementable on emerging v1model.p4 architecture-based programmable switches and capable of maintaining the line-rate performance. Our evaluation shows that P4TE uses a small amount of resources in the PISA pipeline and achieves an improved flow completion time than ECMP and HULA.
... P4DM exploits Data Plane Programmability implemented with the P4 switch programming languages. For the reader not fully familiar with P4, Kfoury et al. [9] recently published a survey on this topic providing a classification and taxonomy of a large number of articles, while also identifying future challenges and future perspectives. We will explain in the remainder of this manuscript which are the advantages of this approach and compare the performance of P4DM with other algorithms that already appeared in the literature. ...
Article
Full-text available
Network management strategies depend on a timely and accurate knowledge of the network performance measures. Among these, one of the most relevant is the delay of the links, which unfortunately is not easy to measure with accuracy, especially when considering multi-hop paths. This is a classical networking problem, for which several solutions have been proposed. Nonetheless, we argue in this manuscript that there is still some room for improving accuracy and effectiveness in the measurement. This paper proposes a new solution based on the exploitation of the P4 data plane programming language. The basic idea is to handle lightweight probe packets that are forged ad-hoc at the edge of a link and processed at the other edge. Hosts generate the probe packets that are then exploited by the P4 programs in the switches to implement the measure. This approach provides an accurate and reliable measure of the link transit time, also effective in multi-hop links. In this latter case, we show that the measurement is not influenced much by the packet loss when the network is overloaded, thus providing more reliable results with respect to more conventional tools such as the classical ping utility. The manuscript explains the proposed P4 solution; then, it provides a comparison with several other approaches found in the literature, showing that outperform most of them, and finally show the behavior of the proposed methodology when facing a multi hop network path on a congested network to prove its robustness.
... Considering this enhancement to SDN and NFV a new layer of softwarization has been introduced that provides the feasibility to program the switch data plane through high-level API and languages [106]. The most prominent approach that attracts the research community is the P4 language [107]. The fog/edge node with P4 capabilities can interconnect several heterogeneous network segments and core infrastructures. ...
Article
Full-text available
Fault-tolerance methods are required to ensure high availability and high reliability in cloud computing environments. In this survey, we address fault-tolerance in the scope of cloud computing. Recently, cloud computing-based environments have presented new challenges to support fault-tolerance and opened new paths to develop novel strategies, architectures, and standards. We provide a detailed background of cloud computing to establish a comprehensive understanding of the subject, from basic to advanced. We then highlight fault-tolerance components and system-level metrics and identify the needs and applications of fault-tolerance in cloud computing. Furthermore, we discuss state-of-the-art proactive and reactive approaches to cloud computing fault-tolerance. We further structure and discuss current research efforts on cloud computing fault-tolerance architectures and frameworks. Finally, we conclude by enumerating future research directions specific to cloud computing fault-tolerance development.
... P4 programmable switches have removed the entry barrier to network design, previously reserved to network vendors. With P4, the user can test and deploy novel protocols and applications in a much shorter time span [3]. Although P4 facilitates the design of customized protocols and applications, learning P4 can be challenging. ...
Conference Paper
Full-text available
This paper describes a cloud infrastructure and virtual laboratories on P4 programmable data plane switches. P4 programmable data planes emerged as a technology that enables innovation in networking. P4 is a programming language used to describe how network packets are processed. This paper explains an entry-level training library on P4. The virtual laboratories introduce the learner to P4 and data plane concepts by providing step-by-step guides and exercises. The virtual laboratories are hosted in the Academic Cloud, a distributed platform that manages and orchestrates computing resources. Additionally, the paper describes a work in progress of P4 virtual laboratories that uses Intel Tofino switches. Lastly, the paper discusses the use of the Academic Cloud as a network testbed.
... Programming Protocol-independent Packet Processors (P4) is a domain-specific programming language for network devices that defines how packets are processed in the data plane devices (e.g., switches, routers, Network Interface Cards (NICs), etc.). The programmable forwarding data plane is an evolution of the SDN paradigm, that was earlier restricted to the OpenFlow protocol [17]. Since its conception, P4 has been leveraged in multiple research areas, most notably in In-band Network Telemetry (INT), load balancing, network performance, congestion control, security, etc. ...
Conference Paper
Full-text available
One of the main roles of the Domain Name System (DNS) is to map domain names to IP addresses. Despite the importance of this function, DNS traffic often passes without being analyzed, thus making the DNS a center of attacks that keep evolving and growing. Software-based mitigation approaches and dedicated state-of-the-art firewalls can become a bottleneck and are subject to saturation attacks, especially in high-speed networks. The emerging P4-programmable data plane can implement a variety of network security mitigation approaches at high-speed rates without disrupting legitimate traffic. This paper describes a system that relies on programmable switches and their stateful processing capabilities to parse and analyze DNS traffic solely in the data plane, and subsequently apply security policies on domains according to the network administrator. In particular, Deep Packet Inspection (DPI) is leveraged to extract the domain name consisting of any number of labels and hence, apply filtering rules (e.g., blocking malicious domains). Evaluation results show that the proposed approach can parse more domain labels than any state-of-the-art P4-based approach. Additionally, a significant performance gain is attained when comparing it to a traditional software firewall-pfsense-, in terms of throughput, delay, and packet loss. The resources occupied by the implemented P4 program are minimal, which allows for more security functionalities to be added.
Article
Service chaining is becoming one of the most considered service deployment frameworks in the context of Network Function Virtualization (NFV) in edge and data center environments, conveniently supported by automatic connectivity configurations offered by Software Defined Networking (SDN). Current research on the topic is focusing on how to guarantee Quality of Service (QoS) in terms of guaranteed end-to-end latency for time critical services. Indeed, latency issues may depend on intra-server virtualization inefficiencies, leading to Virtual Network Function (VNF) delivery delays, or by congestion events occurring at intermediate network elements connecting VNFs. Latency control requires stateful information such as flow delay measurements at a per- packet level, typically not available at traditional SDN switches or inside the VNF. This paper proposes the adoption of SDN data plane programmability exploiting the P4 language and presents two P4 pipeline solutions, suitable for both intra-rack and inter-rack service chain deployments, to automatically check the path latency experienced by selected high priority flows, also resorting to the recent in-band telemetry applications. The programmable pipelines enforce proactive in-network functions, such as priority change or drop actions, in order to guarantee a bound SFC segment latency delivery, including both the network and the segment VNFs. The proposed solutions are implemented and evaluated in a network testbed employing programmable software switches showing their effectiveness in guaranteeing the configured end-to-end latency, and the limited effort in terms of additional processing at the P4 switch. The evaluation is carried out using the reference P4 software switch, i.e., BMv2. The aim is to validate the full P4 capabilities and the code feasibility in terms of scalability, load and resource impact and added intra-switch latency. The experimental results show the proposed approach scales with the number of forwarded flows and achieves per-segment latency control enforcement in both congested and non-congested scenarios with a very limited impact on the switch extra-latency, exploiting finer per-packet tuning of drop and priority change simply applicable through flow entry configuration. Applicability analysis on hardware switches guaranteeing line rate performance are provided.
Article
This work presents P4TE, an in-band traffic monitoring, load-aware packet forwarding, and flow rate controlling mechanism for traffic engineering in fat-tree topology-based data center networks using PISA switches. It achieves sub-RTT reaction time to change in network conditions, improved flow completion time, and balanced link utilization. Unlike the classical probe-based monitoring approach, P4TE uses an in-band monitoring approach to identify traffic events in the data plane. Based on these events, it re-adjusts the priorities of the paths. It uses a heuristic-based load-aware forwarding path selection mechanism to respond to changing network conditions and control the flow rate by sending feedback to the end hosts. It is implementable on emerging v1model.p4 architecture-based programmable switches and capable of maintaining the line-rate performance. Our evaluation shows that P4TE uses a small amount of resources in the PISA pipeline and achieves an improved flow completion time than ECMP and HULA.
Article
Full-text available
The advent of programmable network switch ASICs and recent developments on other programmable data planes (NPUs, FPGAs) drive the renewed interest in network data plane programmability. The P4 language has emerged as a strong candidate to describe a protocol independent datapath pipeline. With its supported architectures, the P4 language provides an excellent way to define the packet processing and forwarding behavior, while leaving other networking components such as the traffic management engine, to non-programmable fixed function elements, based on the capabilities of most programmable devices. However, network flexibility is essential to meet the Quality of Service (QoS) requirements of traffic flows. Thus, enabling programmable control for fixed-function elements like traffic management is crucial. Towards that end we propose the use of virtual queues in the P4 pipeline, investigate the application of virtual queue-based traffic management, and portability of the approach using different P4 programmable targets. Specifically, we focus on virtual queue based Active Queue Management (AQM) for congestion policing and meeting the latency targets of distinct network slices. The solution is compared to P4 built-in functionality for bandwidth management using meters, proving also that the additional dimensions of control are achieved without compromising the processing complexity of the solution.
Article
Full-text available
5G Networks revolution will be enabled by deep integration of Software Defined Networking (SDN) and Network Function Virtualization (NFV) to support multi-tenancy, per-user and per-application quality of service and experience. However, full softwarization and current SDN platforms may not be able to sustain the complexity and the heterogeneity of different requirements, e.g. strict latency, jitter, high precision traffic and advanced monitoring. For such services, SDN/NFV needs to be boosted not only considering orchestration and control plane, but also data plane programmability. In this paper, the potential of the P4 language is illustrated with the aim to show its disruptive novel functionalities at the data plane level currently not available in a SDN/NFV network, opening the way to new orchestration frameworks and enabling a novel autonomic and flexible network at the edge. Use cases, assessments and softwarized performance results are proposed and discussed in the edge and IoT scenario, targeting advanced traffic engineering, cyber security, multi-tenancy, 5G offloading, and telemetry, to demonstrate the feasibility of such approach.
Article
Full-text available
Volumetric distributed Denial-of-Service (DDoS) attacks have become one of the most significant threats to modern telecommunication networks. However, most existing defense systems require that detection software operates from a centralized monitoring collector, leading to increased traffic load and delayed response. The recent advent of Data Plane Programmability (DPP) enables an alternative solution: threshold-based volumetric DDoS detection can be performed directly in programmable switches to skim only potentially hazardous traffic, to be analyzed in depth at the controller. In this paper, we first introduce the BACON data structure based on sketches, to estimate per-destination flow cardinality, and theoretically analyze it. Then we employ it in a simple in-network DDoS victim identification strategy, INDDoS, to detect the destination IPs for which the number of incoming connections exceeds a pre-defined threshold. We describe its hardware implementation on a Tofino-based programmable switch using the domain-specific P4 language, proving that some limitations imposed by real hardware to safeguard processing speed can be overcome to implement relatively complex packet manipulations. Finally, we present some experimental performance measurements, showing that our programmable switch is able to keep processing packets at line-rate while performing volumetric DDoS detection, and also achieves a high F1 score on DDoS victim identification.
Article
Full-text available
Network slicing is considered a key technology in enabling the underlying 5G mobile network infrastructure to meet diverse service requirements. In this article, we demonstrate how transport network slicing accommodates the various network service requirements of Massive IoT (MIoT), Critical IoT (CIoT), and Mobile Broadband (MBB) applications. Given that most of the research conducted previously to measure 5G network slicing is done through simulations, we utilized SimTalk, an IoT application traffic emulator, to emulate large amounts of realistic traffic patterns in order to study the effects of transport network slicing on IoT and MBB applications. Furthermore, we developed several MIoT, CIoT, and MBB applications that operate sustainably on several campuses and directed both real and emulated traffic into a Programming Protocol-Independent Packet Processors (P4)-based 5G testbed. We then examined the performance in terms of throughput, packet loss, and latency. Our study indicates that applications with different traffic characteristics need different corresponding Committed Information Rate (CIR) ratios. The CIR ratio is the CIR setting for a P4 meter in physical switch hardware over the aggregated data rate of applications of the same type. A low CIR ratio adversely affects the application’s performance because P4 switches will dispatch application packets to the low-priority queue if the packet arrival rate exceeds the CIR setting for the same type of applications. In our testbed, both exemplar MBB applications required a CIR ratio of 140% to achieve, respectively, a near 100% throughput percentage with a 0.0035% loss rate and an approximate 100% throughput percentage with a 0.0017% loss rate. However, the exemplar CIoT and MIoT applications required a CIR ratio of 120% and 100%, respectively, to reach a 100% throughput percentage without any packet loss. With the proper CIR settings for the P4 meters, the proposed transport network slicing mechanism can enforce the committed rates and fulfill the latency and reliability requirements for 5G MIoT, CIoT, and MBB applications in both TCP and UDP.
Article
Full-text available
The past few years have witnessed the compelling applications of the Internet of Things (IoT) in our daily life. Meanwhile, with the explosion of IoT devices and various applications, the expectations for the performance, reliability, and security of networks are greater than ever. The current end-host-based or centralized control framework incurs too much communication and computation overhead, therefore exhibiting tardy and clumsy to respond to network dynamics. Recently, with the advancement of programmable network hardware, it is possible to implement network functions inside the network for improving performance. However, current in-network schemes are largely dependent on a manual process, which presents poor scalability and robustness. In this paper, we present a new network intelligent control architecture, in-network intelligence control. We design intelligent in-network devices that can automatically adapt to network dynamics by leveraging the powerful machine learning adaptive abilities. In addition, to enhance the collaboration among distributed switches, a centralized management plane is introduced to ease the training process of distributed in-network devices. To demonstrate the technical feasibility and performance advantage of our architecture, we present three use-cases, in-network load balance, in-network congestion control, and in-network DDoS detection.
Conference Paper
Full-text available
Bufferbloat and congestion in the Internet call for the application of AQM wherever possible: on backbone routers, on data center switches, and on home gateways. While it is easy to deploy on software switches, implementing and deploying RFC-standardized AQM algorithms on programmable, pipeline-based ASICs is challenging as architectural constraints of these ASICs were unknown at the time of standardization. In this work, we call for reigniting the work on AQM algorithms by illustrating the difficulties when implementing the PIE AQM in three fashions on an Intel Tofino switching ASIC. All our implementations come with trade-offs, which, in turn, have a significant impact on their performance. The conceptual challenges further suggest that it is currently not possible to implement a fully RFC-compliant PIE version on Tofino. We find that it is non-trivial to transfer RFC recommendations to the resource-constrained Tofino, operating at hundreds of gigabit per second. We thus argue that there is a need for AQM specifications that acknowledge the omnipresence of congestion and include architectural constraints of programmable ASICs into their design.
Article
Full-text available
In the last few years, the emergence of Programmable Data Planes and the appearance of programming protocol-independent languages such as P4 have offered powerful tools to define new network protocols, as well as to redesign existing network applications and systems. Network telemetry is one of the main areas of interest identified by the P4 Application Working Group. The collection of network-wide, fine-grained network information in real-time is a critical requirement for the design of useful and adequate monitoring tools that can be integrated into complex Operations, Administration & Maintenance applications. Recent research has focused on the definition and implementation of in-band monitoring systems, where specifically dedicated monitoring packets are not required. Even though the In-Band Network Telemetry specification proposed by the P4 Language Consortium is the starting point of many of the in-band monitoring systems, this is not the only alternative. Therefore, in this work, we will describe and compare other P4-based in-band passive telemetry proposals.
Article
Full-text available
Distributed Denial-of-Service (DDoS) attacks have been steadily escalating in frequency, scale, and disruptiveness—with outbreaks reaching multiple terabits per second and compromising the availability of highly-resilient networked systems. Existing defenses require frequent interaction between forwarding and control planes, making it difficult to reach a satisfactory trade-off between accuracy (higher is better), resource usage, and defense response delay (lower is better). Recently, highperformance programmable data planes have made it possible to develop a new generation of mechanisms to analyze and manage traffic at line rate. In this paper, we explore P4 language constructs and primitives to design EUCLID, a fully innetwork fine-grained, low-footprint, and low-delay traffic analysis mechanism for DDoS attack detection and mitigation. EUCLID utilizes information-theoretic and statistical analysis to detect attacks and classify packets as either legitimate or malicious, thus enabling the enforcement of policies (e.g., discarding, inspection, or throttling) to prevent attack traffic from disrupting the operation of its victims. We experimentally evaluate our proposed mechanism using packet traces from CAIDA. The results indicate that EUCLID can detect attacks with high accuracy (98.2%) and low delay (≈250 ms), and correctly identify most of the attack packets (>96%) without affecting more than 1% of the legitimate traffic. Furthermore, our approach operates under a small resource usage footprint (tens of kilobytes of static random-access memory per 1 Gbps link and a few hundred ternary content-addressable memory entries), thus enabling its deployability on high-throughput, high-volume scenarios.
Article
Many promising networking research ideas in programmable networks never see the light of day. Yet, deploying research prototypes in production networks can help validate research ideas, improve them with faster feedback, uncover new research questions, and also ease the subsequent transition to practice. In this paper, we show how researchers can run and validate their research ideas in their own backyards---on their production campus networks---and we have seen that such a demonstrator can expedite the deployment of a research idea in practice to solve real network operation problems. We present P4Campus , a proof-of-concept that encompasses tools, an infrastructure design, strategies, and best practices---both technical and non-technical---that can help researchers run experiments against their programmable network idea in their own network. We use network tapping devices, packet brokers, and commodity programmable switches to enable running experiments to evaluate research ideas on a production campus network. We present several compelling data-plane applications as use cases that run on our campus and solve production network problems. By sharing our experiences and open-sourcing our P4 apps [28], we hope to encourage similar efforts on other campuses.
Article
State-of-the-art mechanisms against eavesdropping first encrypt all packet payloads in application layer, then split the packets into multiple network paths. However, versatile eavesdroppers could simultaneously intercept several paths to intercept all the packets, classify the packets into streams using transport fields, and analyze the streams by brute-force. In this paper, we propose a Programming Protocol-independent Packet Processors (P4) based Network Immune Scheme (P4NIS) against the intractable eavesdropping. Specifically, P4NIS is equipped with three lines of defenses to provide a softwarized network immunity. Packets are successively processed by the third, second and first line of defenses. The third line basically encrypts all packet payloads in application layer using cryptographic mechanisms. Additionally, the second line re-encrypts all packet headers in transport layer to distribute the packets from one stream into different streams, and disturbs eavesdroppers to classify the packets correctly. Besides, the second line adopts a programmable design for dynamically changing encryption algorithms. Complementally, the first line uses programmable forwarding policies which could split all the double-encrypted packets into different network paths disorderly. Using a paradigm of programmable data planes—P4, we implement P4NIS and evaluate its performances. Experimental results show that P4NIS can increase difficulties of eavesdropping and transmission throughput effectively compared with state-of-the-art mechanisms. Moreover, if P4NIS and state-of-the-art mechanisms have the same level of defending eavesdropping, P4NIS can decrease the encryption cost by 69.85% 81.24%.