Conference PaperPDF Available

Fast Dynamic Control of Optical Data Center Networks Based on Nanoseconds WDM Photonics Integrated Switches

Authors:
OECC/PSC 2019
©IEICE
Fast Dynamic Control of Optical Data Center
Networks Based on Nanoseconds WDM
Photonics Integrated Switches
Xuwei Xue, Kristif Prifti, Bitao Pan, Fulong Yan, Xiaotao Guo, Nicola Calabretta
IPI-ECO Research Institute, Eindhoven University of Technology, Eindhoven, the Netherlands
x.xue.1@tue.nl
Abstract: We demonstrate a fast and dynamic control of photonic switches based DCN by a distributed
optical-flow control. Results show error-free 10 Gb/s switching with < 2dB penalty, no packet-loss at
switch and 260.66ns end-to-end latency.
Keywords: Data Center Network and Subsystem, Optoelectronic and all-optical switches.
INTRODUCTION
With the emerging of cloud computing, artificial intelligence, and the incoming of 5G mobile communications, the
traffic communications inside the data centers (DC) are imposed stringent requirements in terms of low latency, high
capacity, and high cost and power-efficiency [1]. With the aim to satisfy the scalable growth in both network traffic
volume and connected endpoints while decreasing the cost and the energy consumption, transparent optical DC networks
(DCNs) based on fast optical switches have been considered, featuring the data rate and format transparency and
eliminating the power consuming O/E/O conversions [2].
An optical DCN architecture based on distributed nanoseconds photonic integrated switches (PIC) has been proposed
and numerically investigated in [3]. The statistical multiplexing and the high-throughput provided by the photonics
integrated switch allows for efficient resource utilization, low latency, and high capacity and connectivity. However,
despite the nanoseconds reconfiguration time of the photonic switch, a fast dynamic network controlling mechanism that
enables the fully exploitation of the statistical multiplexing and improves the network throughput is essential. Moreover,
the lack of optical buffer at the optical switch results in large packets loss as the traffic load increases. Recently, a dynamic
network control mechanism with packets contention resolution protocol named optical Flow Control has been proposed
and numerically assessed to control the DCN and prevent the packets loss by packets retransmission [4]. However, the
fast dynamic control of the DCN based on the optical Flow Control protocol has not been experimentally implemented
and assessed to validate the network performance and to achieve the fast control of the WDM photonic switches.
In this work, FPGA is utilized to implement the ToRs and the switch controllers with the Flow Control protocol of the
photonics switch based DCN to achieve the fast dynamic control of the network and thus improving the utilization of
statistical multiplexing and high network throughput. Moreover, optical Flow Control protocol is deployed between the
FPGA-based controllers and ToRs to enhance the fast control mechanism preventing packets loss due to the lack of optical
buffers. Experimental results confirm that the fast dynamic control operation between ToRs and optical switches with no
packet loss at the photonics switch, and validate the switching operation in space, wavelength, and time domain of the
photonics integrated switch. Deploying the fast control, the photonics switch based DCN can error-free switch the 10
Gb/s data packets with <2dB penalty and 260.66 ns ToR-to-ToR latency. .
FAST DYNAMIC CONTROL SYSTEM
The architecture of the proposed DCN is shown in Fig. 1(a) and numerically investigated in [5]. N racks are grouped
into one cluster and there are N clusters in the proposed DCN. The intra-cluster switch (IS) and inter-cluster switch (ES)
are used to forward the intra cluster and inter cluster traffic, respectively. The i-th ToR in each cluster are interconnected
by the i-th ES (1≤i≤N). Every photonics switch has a corresponding FPGA-based switch controller to reconfigure the fast
switch and then forward the traffic packets. Each FPGA-based ToR generates data traffic consisting of the optical
Fig. 1: (a) DCN architecture employing the nanoseconds and buffer-less photonics integrated switches. (b) Schematic of the photonics
integrated switch and switch controller. (c) The fabricated photonics integrated switch chip.
OECC/PSC 2019
©IEICE
payloads and optical labels carrying the switch port destination information. One copy of the payloads is stored in the
electrical buffer of the ToR and it will be released or retransmitted based on the Flow Control feedback from the switch
controller. The optical labels are processed by the switch controller, which extracts the destination information, checks
possible contentions with other packets with the same destination, and thus sets the SOA gates inside the photonics
integrated switch to forward the optical payloads. Distributed optical Flow Control is deployed between the ToRs and the
IS/ES switch controllers to solve the packets contention, thus preventing packets loss at the photonics switch. After the
packets contention check by the switch controller, a positive acknowledgement (ACK, successful forwarding) is sent back
to the ToR, and the optical payload is released from the ToR buffer. In response to a negative acknowledgement (NACK,
packet dropped due to the contention), the ToR retransmits the optical payload. Note that given the proposed DCN
architecture, the implemented optical Flow Control operates in a fully distributed way, which enhances the scalability and
decreases the complexity of the DCN control as well as average latency.
The buffer-less photonics integrated switch is schematically shown in Fig. 1(b). The arrayed waveguide gratings
(AWG) groups WDM wavelengths coming from different ToRs and each respective optical module consists of a 1: N
splitter to broadcast the WDM channels to the N wavelength selective switches (WSS). The outputs of the N WSSs are
connected to the respective N output ports. Each WSS can select one wavelength channel and forward the channel to the
output port according to the switching control signals. Turning on/off the N SOAs determines which wavelength channel
is forwarded to the output or is blocked. It is worth to notice that every optical module can be operated as an ES or IS
optical switch. As shown in Fig. 1(b) each optical module forwards the input WDM signals from the ToRs in the cluster
by the WSSs to the other ToRs residing in the cluster. Specifically, the 6x4 mm2 fabricated photonic switch chip in Fig.
1(c) integrates 4 optical modules that can been used to implement 4 ES or 4 IS for 4 ToRs intra-cluster or inter-cluster
interconnection. At the input of each module, an 800 μm booster SOA is employed to compensate the 6 dB losses of the
1:4 splitter and partially the AWGs losses at the WSS. The passive 1:4 splitter is realized by cascading 1×2 multimode
interferometer (MMI). Each of the four identical modules processes one of the four WDM inputs and forwards them to
the dedicated outputs. Therefore, two photonics integrated switch chips can be used to implement the 4 ES and 4 IS
switches to interconnect 16 ToRs grouped in 4 clusters.
EXPERIMENTAL SET-UP AND RESULTS
The experimental set-up to assess the fast dynamic control of the optical DCN based on the nanoseconds photonics
integrated switches is shown in Fig. 2. The optical modules of PIC-1 are used to implement the intra-cluster switches IS-
1, IS-2, IS-3, and IS-4 that interconnects the 4 ToRs of cluster 1, cluster 2, cluster 3, and cluster 4, respectively. The
optical modules of PIC-2 are utilized to implement the inter-cluster switches ES-1, ES-2, ES-3, and ES-4 that
interconnects the 4 i-th ToRs of these 4 clusters (1≤i≤4). Note that given the modularity of the architecture (4 ToRs
interconnected by the intra-cluster switch IS, and 4 inter-cluster ToRs connected by the inter-cluster ES), and being all
the optical modules of the PICs the same copy, the assessment of one optical module (IS-1) connecting the 4 ToRs in the
cluster (cluster 1) is representative for all the other intra-cluster and inter-cluster switch modules operation and
performance. Therefore, the dynamic switching operation with optical flow control within the 4 ToRs of cluster 1 has
been investigated for the IS-1.The 4 ToRs are implemented by an FPGA that integrates the packets (labels and payloads)
generation, the electronic buffers, and the optical Flow Control protocol for each distinct ToR. The FPGA is equipped
with 10 Gb/s SFP transceivers at 1525.0nm, 1528.9nm, 1532.9nm and 1536.8nm to generate the distinct optical packets
for the four ToRs, respectively. The optical packets time slot is 600 ns (540 ns payload time and 60 ns guard time). The
four ToRs optical channels are amplified and injected to the unpackaged IS-1 PIC switch module via an EDFA, while the
optical label signals are sent to the FPGA-based switch controllers. After detecting the optical labels and resolving the
packets contention, the switch controller sets the SOA gates to dynamic control the PIC switch.
First, we assess and validate the dynamic switching operation of the IS-1. In particular, we first assess the switching
operation of the WSS1 of the IS-1 module. The switch enabling signals (label control 1-4) generated by the FPGA switch
controller after processing the optical labels of the 4 ToR channels, and the switched channels at the WSS1 (output port
1) are reported in Fig. 3(a). The enabling signals are synchronized with the optical packets and a bias current of 40mA is
applied for the “on” state of the SOA gates. The traces indicate that the optical packets are dynamically switched according
to the FPGA control signals. Bit error rate (BER) curves for the switched 4 ToRs channel inputs are reported in Fig. 3(b).
Fig. 2: Experimental set-up employed for performance evaluation.
OECC/PSC 2019
©IEICE
The back-to-back (B2B) curve is included as reference. Error-free operations with < 1 dB have been measured for ToR 1
(CH 1) and ToR 3 (CH 3) packets, while for ToR 2 (CH2) and TOR 4 (CH 4) packets the penalty is around 2 dB, but still
enough quality to be correctly detected. Secondly, the dynamic switch control operation of the packets to the four WSSs
output ports has also been validated. Packets from ToR 1 has been switched to the 4 output ports by dynamic controlling
the SOA gates of the 4 WSSs. The trace of the label control signals and the switched optical packets at the four outputs
are reported in Fig. 3 (c). The switched packets are detected by the FPGA ToRs and the measured ToR-to-ToR latency is
260.66 ns. Those results confirm that the fast dynamic control operation between ToRs and IS, and validate the switching
operation in space, wavelength, and time domain of the FPGA controlled PIC switch.
The distributed optical Flow Control protocol utilized to prevent the packets loss at the switch is also demonstrated.
Heavy traffic load is generated by the FPGA based ToRs to induce high packets contention. Fig. 3(d) shows the
transmitted optical labels of ToR 1-4, RequestMessage_NextDestination signals (RM), with the packets destined ports
information to the switch controller at every time slot. The FPGA switch controller processes the 4 RM signals and
operates the contention resolution protocol to generate the RequestResponse_NextDestination (RR) signals back to the
ToRs. When there is no packet contention (e.g. RM: 003, 002, 000, 001), the RR (003, 002, 000, 001) signals will be sent
to the corresponding ToRs. While in case of packets contention (e.g. RM: 003, 000, 000, 002, where the packets from the
ToR 1 and ToR 2 have the same destination 000), the RR signals sent back to the ToRs will be 003, 000, 001, 002. As
shown in Fig. 3 (e), when the ToR receives the ACK signal (RR=RM), the ToR will release the optical payload stored in
the buffer and send a new RM signal for the next payloads in the following time slot. When receives a NACK signal (RR
/= RM), the ToR will retransmit the same RM signal and the corresponding optical payload until receiving the ACK
signal meaning the successful forwarding. The monitored statistics (counts of lost and retransmitted packets) at the FPGA
controller shown in Fig. 3(f) confirm that the optical Flow Control protocol prevents any packets loss at the PIC switch.
CONCLUSIONS
The fast dynamic and distributed optical flow control of the optical DCN based on distributed nanoseconds photonics
integrated switches are implemented and experimentally assessed. The proposed photonics switch based DCN can be fast
and dynamically controlled by the optical flow control protocol implemented between the FPGA-based switch controllers
and ToRs. Experimental results indicate that the photonics switch based DCN can switch error-free the 10 Gb/s traffic in
space, wavelength, and time domain with no packets loss and <2dB penalty and 260.66 ns ToR-to-ToR latency. .
ACKNOWLEDGMENT
The authors would like to thank the H2020 Passion (780326) and H2020 Qameleon (780354) projects for partially
supporting this work.
REFERENCES
[1] Cisco Global Cloud Index: Forecast and Methodology, 20162021 White Paper, USA. (2016).
[2] W. Xia et al. “A survey on data center networking: Infrastructure and operations IEEE Commun. surveys & tutorials, 2017.
[3] N. Calabretta et al. Monolithically integrated WDM cross-connect switch for high-performance optical data center networks”
2017 Optical Fiber Communications Conference and Exhibition (OFC). IEEE, 2017: 1-3.
[4] M. Wang et al. Low latency and efficient optical flow control for intra data center networksOptics express(2014), 427-434.
[5] F. Yan et al. Opsquare: A flat DCN architecture based on flow-controlled optical packet switches IEEE/OSA Journal of
Optical Communications and Networking, 2017, 9(4): 291-303.
Fig. 3: (a) Traces for WSS1; (b) BER curves for WDM channels input at 10 Gb/s. CH: channel; (c) Traces for 4 outputs; Label requests and
responses at (d) switch controller and (e) ToR; (f) Statistics monitored at switch controller.
... Due to the inability of information processing at the optical switches, a suitable and fast control mechanism is required for the switch control and fast-forwarding of the data traffic to effectively harness the promises held by the high-bandwidth optical switches [6]. Typically, an optical label or header, carrying the destination information of the data packet, is associated with the data packet to be processed at the switch controller [7]. ...
... The setup consists of 4 FPGA-based (Xilinx Virtex UltraScale+ VU9P) ToRs and each one equips with a 10.3125 Gb/s data channel to deliver the data packets. One SOA-based 4 × 4 optical switch [6] with corresponding FPGA-based (Xilinx VC709) controller are utilized to interconnect all the ToRs. Three buffer blocks, where the buffer size is controllable and variable due to the reprogramming capability of FPGA, are deployed inside each ToR. ...
Article
Full-text available
Switching the traffic in the optical domain has been considerably investigated as a future-proof solution to overcome the intrinsic bandwidth bottleneck of electrical switches in data center networks (DCNs). However, due to the lack of fast and scalable optical switch control mechanism, the lack of optical buffers for contention resolution, and the complicated implementation of fast clock and data recovery (CDR), the practical deployment of fast optical switches in data centers (DCs) remains a big challenge. In this work, we develop and experimentally demonstrate for the first time a flow-controlled and clock-distributed optical switch and control system, implementing 43.4 ns optical switch configuration time, less than 3.0E-10 packet loss rate resulting from the packet contention, and 3.1 ns fast CDR time. Experimental results confirm that zero buffer overflow caused packet loss and lower than $3~ \boldsymbol {\mu }\text{s}$ server-to-server latency are achieved for network deploying a smaller electrical buffer of 8192 bytes at a traffic load of 0.5. Real servers running the Transmission Control Protocol (TCP) traffic generating and monitoring tools are exploited in this switch and control system as well, validating its capability of running practical DCNs services and applications with full TCP bandwidth.
... Meanwhile, the optical label signals indicating the destination of the corresponding data packets are delivered to the switch controller via the label channels. Based on the received label signals, the switch controller checks the conflicted packets and accordingly generates the control signals to configure the optical switch for the forwarding of the packets [17,31]. ...
... The buffer threshold value, determining the case selection (polling order case or HOL order case), is dynamically configured based on the real-time monitored traffic load. The 4×4 SOA based optical switch is deployed to interconnect these 4 ToRs [31]. Label signals and flow control signals (ACK/NACK) are transmitted between the ToRs and the FPGA-based (Xilinx Virtex-7 VC709) switch controller. ...
Article
Full-text available
Due to the lack of optical buffer, high packer loss caused by packet contention is one of the main challenges for the optical switching data centers (DCs). Flow control (FC) protocol employing electrical buffers at the top of racks (ToRs) and exploiting packet retransmission mechanism in case of contention has been extensively investigated to decrease the packet loss in optical DCs. However, the packet retransmission and the head-of-line (HOL) blocking in electrical buffers at substantial load traffic introduce extra latency and buffer-overflow that still cause packet loss. To overcome these issues, a novel contention resolution technique based on a software-defined networking (SDN) enabled optical polling flow control is proposed and experimentally assessed in this paper. Experimental assessments show that the proposed contention resolution scheme achieves zero packet loss and 7.4 μs latency performance at the load of 0.4. In addition, we numerically modelled and investigated the scalability of the proposed contention resolution technique in a large scale DCN based on the experimental parameters. Results prove the excellent scalability performance of OPFC scheme, in which the packet loss increases from 2.1E-3 to 9.02E-3 and the average latency increases 5.17 μs at the load of 0.5 as the OPFC based network scales from 4 to 40960 servers.
... Fast and Scalable Switch Control: Despite the promises held by fast optical switches, the corresponding nanoseconds-scale control mechanisms are required to control the switches and to fast forward the data traffic [119]. Typically, an optical label or header, carrying the destination information of the data packet, is associated with the optical data packet to be processed at the specific switch controller. ...
Article
Full-text available
Relying on the flexible-access interconnects to the scalable storage and compute resources, data centers deliver critical communications connectivity among numerous servers to support the housed applications and services. To provide the high-speeds and long-distance communications, the data centers have turned to fiber interconnections. With the stringently increased traffic volume, the data centers are then expected to further deploy the optical switches into the systems infrastructure to implement the full optical switching. This paper first summarizes the topologies and traffic characteristics in data centers and analyzes the reasons and importance of moving to optical switching. Recent techniques related to the optical switching, and main challenges limiting the practical deployments of optical switches in data centers are also summarized and reported.
... Despite the promises held by the fast optical switching technologies, the practical implementation of the nanosecond optical switching DCNs is actually facing several challenges. As the main unresolved challenge, a nanosecond scale control system is essential for the switch control and fast forwarding of the data traffic 16 . Fast switch reconfiguration time including both hardware switching time (on the order of nanoseconds for SOA-based switch) and controlling overhead time is essential as it determines the network throughput and latency performance. ...
Article
Full-text available
Electrical switching based data center networks have an intrinsic bandwidth bottleneck and, require inefficient and power-consuming multi-tier switching layers to cope with the rapid growing traffic in data centers. With the benefits of ultra-large bandwidth, high-efficient cost and power consumption, switching traffic in the optical domain has been investigated to replace the electrical switches inside data center networks. However, the deployment of nanosecond optical switches remains a challenge due to the lack of corresponding nanosecond switch control, the lack of optical buffers for packet contention, and the requirement of nanosecond clock and data recovery. In this work, a nanosecond optical switching and control system has been experimentally demonstrated to enable an optically switched data center network with 43.4 nanosecond switching and control capability and with packet contention resolution as well as 3.1 nanosecond clock and data recovery. Several challenges still impede the deployment of optical switches in data centers. The authors report an optical switching and control system to synergistically overcome these challenges and provide enhanced performance for data center applications.
Article
Full-text available
We experimentally demonstrate a highly spectral efficient optical flow control technique for intra data center networks. A bi-directional system is implemented for generating flow control signal by reusing label wavelength and the transmission link within the same WDM channel. Dynamic operation shows high-quality flow control signal with 265ns latency including 220ns propagation delay and 500mV amplitude with low input power and low bias current. Error free operation with 0.5dB penalty for 40Gb/s payload indicates that no distortion has been caused due the transmission of label and flow control signal.
Article
Aiming at solving the scaling issues of bandwidth and latency in current hierarchical data center network (DCN) architectures, we propose and investigate a novel optical flat DCN architecture in which the number of interconnected ToRs scales as the square of the optical packet switches' (OPS) port count (OPSquare). The proposed flat DCN architecture consists of two parallel interand intra-cluster networks that are built on a single-hop OPS with nanosecond time and wavelength switching for efficient statistical multiplexing operations. Fast optical flow control is implemented for solving packet contentions that may occur at the buffer-less optical switches. The performance of OPSquare DCN in terms of scalability, packet loss, latency, and throughput is assessed by a numerical simulation employing OMNeT++ under realistic data center (DC) traffic. The results report a server-to-server latency of less than 2 ?s (including packet retransmission), a packet loss <10?5 at a load of 0.4, and aDC size of 10,240 servers with a ToR buffer size equal to 50 KB for all traffic patterns. Moreover, the cost and power consumption of the OPSquare DCN have been studied and compared with fat-tree DCN based on electrical switches and H-LION connected by an arrayed wave guide grating router (AWGR). The results indicate 23.8% and 39% cost and power savings, respectively, for the OPSquare DCN supporting 160,000 servers with respect to the fat-tree DCN. The OPSquare has a cost saving of 56% compared with H-LION for a 160,000-server DCN.
Conference Paper
The switching performance of a photonic integrated WDM cross-connect switch is assessed with 40Gb/s NRZ-OOK, 20Gb/s PAM4 and data-rate adaptive DMT traffic. Results show limited penalty for single/WDM channels and > 10dB power dynamic range.
Article
Data centers (DCs), owing to the exponential growth of Internet services, have emerged as an irreplaceable and crucial infrastructure to power this ever-growing trend. A data center typically houses a large number of computing and storage nodes, interconnected by a specially designed network, namely, data center network (DCN). The DCN serves as a communication backbone and plays a pivotal role in optimizing data center operations. However, compared to the traditional network, the unique requirements in the DCN, for example, large scale, vast application diversity, high power density, and high reliability, pose significant challenges to its infrastructure and operations. We have observed from the premium publication venues (e.g., journals and system conferences) that increasing research efforts are being devoted to optimize the design and operations of the DCN. In this paper, we aim to present a systematic taxonomy and survey of recent research efforts on the DCN. Specifically, we propose to classify these research efforts into two areas: i) DCN infrastructure and ii) DCN operations. For the former aspect, we review and compare the list of transmission technologies and network topologies used or proposed in the DCN infrastructure. For the latter aspect, we summarize the existing traffic control techniques in the DCN operations, and survey optimization methods to achieve diverse operational objectives, including high network utilization, fair bandwidth sharing, low service latency, low energy consumption, high resiliency, and etc., for efficient data center operations. We finally conclude this survey by envisioning a few open research opportunities in DCN infrastructure and operations.
  • Usa
Cisco Global Cloud Index: Forecast and Methodology, 2016-2021 White Paper, USA. (2016).