
Zhizhen Zhong- Doctor of Philosophy
- PostDoc at Massachusetts Institute of Technology
Zhizhen Zhong
- Doctor of Philosophy
- PostDoc at Massachusetts Institute of Technology
Postdoc at MIT CSAIL
About
48
Publications
6,518
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
477
Citations
Introduction
I am a postdoctoral researcher at MIT. My research focuses on realizing the next generation of networked computer systems by engineering the unique properties of light and its fundamental particles (photons). Towards this vision, my work takes an application-centric approach to co-design different stacks of networked computer systems: from fundamental photonic/electronic integrated circuits and devices, to computer hardware architecture, all the way to control algorithms and software systems.
Current institution
Additional affiliations
August 2010 - August 2014
December 2019 - October 2020
February 2018 - July 2018
Publications
Publications (48)
We demonstrate a practical Bayesian Optimization system for wavelength reconfiguration at Facebook backbone. Our system uses a firewall for safe deployment. It is open-source, compatible with any vendor, and achieves 4.76× faster wavelength reconfiguration.
Fiber cut events reduce the capacity of wide-area networks (WANs) by several Tbps. In this paper, we revive the lost capacity by reconfiguring the wavelengths from cut fibers into healthy fibers. We highlight two challenges that made prior solutions impractical and propose a system called Arrow to address them. First, our measurements show that con...
We present In-network Optical Inference (IOI), a system providing low-latency machine learning inference by leveraging programmable switches and optical matrix multiplication. IOI consists of a novel transceiver module designed specifically to perform linear operations such as matrix multiplication in the optical domain. IOI’s transceivers are plug...
Advanced machine learning models are currently impossible to run on edge devices such as smart sensors and unmanned aerial vehicles owing to constraints on power, processing, and memory. We introduce an approach to machine learning inference based on delocalized analog processing across networks. In this approach, named Netcast, cloud-based "smart...
The massive growth of machine learning-based applications and the end of Moore's law have created a pressing need to redesign computing platforms. We propose Lightning, the first reconfigurable photonic-electronic smartNIC to serve real-time deep neural network inference requests. Lightning uses a fast datapath to feed traffic from the NIC into the...
Mixture-of-Expert (MoE) models outperform conventional models by selectively activating different subnets, named \emph{experts}, on a per-token basis. This gated computation generates dynamic communications that cannot be determined beforehand, challenging the existing GPU interconnects that remain \emph{static} during the distributed training proc...
This paper analyzes the performance and energy efficiency of Netcast, a recently proposed optical neural-network architecture designed for edge computing. Netcast performs deep neural network inference by dividing the computational task into two steps, which are split between the (cloud) server and (edge) client: (1) the server employs a wavelength...
We demonstrate Lightning, a reconfigurable photonic-electronic deep learning smartNIC that serves real-time inference requests at 4.055 GHz compute frequency. To do so, Lightning uses a novel datapath to feed traffic from the NIC into its photonic computing cores without incurring digital data movement bottlenecks. Lightning achieves this by employ...
The rising demand for WAN capacity driven by the rapid growth of inter-data center traffic poses new challenges for costly optical networks. Today, cloud providers rely on fixed optical backbones, where all hardware devices operate on a rigid spectrum grid, leading to the waste of expensive optical resources and subpar performance in handling failu...
We propose a photonic edge computing architecture based on WDM, broadband modulation, and output-stationary integration. Using this scheme, we demonstrate 98.8%-accurate DNN inference over an 86-km deployed fiber link with 3 THz optical bandwidth.
This paper analyzes the performance and energy efficiency of Netcast, a recently proposed optical neural-network architecture designed for edge computing. Netcast performs deep neural network inference by dividing the computational task into two steps, which are split between the server and (edge) client: (1) the server employs a wavelength-multipl...
Advances in deep neural networks (DNNs) are transforming science and technology. However, the increasing computational demands of the most powerful DNNs limit deployment on low-power devices, such as smartphones and sensors – and this trend is accelerated by the simultaneous move towards Internet-of-Things (IoT) devices. Numerous efforts are underw...
Advances in deep neural networks (DNNs) are transforming science and technology. However, the increasing computational demands of the most powerful DNNs limit deployment on low-power devices, such as smartphones and sensors -- and this trend is accelerated by the simultaneous move towards Internet-of-Things (IoT) devices. Numerous efforts are under...
We explore a novel approach for building DNN training clusters using commodity optical devices. Our proposal, called TopoOpt, co-optimizes the distributed training process across three dimensions: computation, communication, and network topology. TopoOpt uses a novel alternating optimization technique and a group theory-inspired algorithm to find t...
We present experimental demonstrations of ultra-low power edge computing enabled by wavelength division multiplexed optical links and time-integrating optical receivers. Initial experimentation demonstrations show ≲ 10 fJ of optical energy per MAC.
As the COVID-19 pandemic reshapes our social landscape, its lessons have far-reaching implications on how online service providers manage their infrastructure to mitigate risks. This paper presents Facebook's risk-driven backbone management strategy to ensure high service performance throughout the COVID-19 pandemic. We describe Risk Simulation Sys...
The “pay-as-you-grow” cloud computing model has become popular for today’s enterprises. Cloud computing not only frees end users from complex operations, but also allows higher resource utilization, lower investment, and increased energy efficiency. However, with some emerging technologies, cloud computing is unable to meet the required latency lev...
Physical layer attacks threaten services transmitted through optical networks. To detect attacks, we present an investigation of optical spectrum feature analysis (OSFA) and recognition. By analyzing the spectral features of optical signals, recognition and detection of unauthorized signals can be realized. In this paper, (1) we theoretically analy...
Transient traffic spikes are becoming a crucial challenge for network operators, from both user-experience and network-maintenance perspectives. Different from long-term traffic growth, the bursty nature of short-term traffic fluctuations makes it difficult to be provisioned effectively. Luckily, next-generation elastic optical networks (EONs) prov...
In future optical satellite networks with various service requirements, the bandwidth of a single traffic request occupies part of an inter-satellite link (ISL) channel capacity, thus leading to a greater demand for flexible resource allocation. The switching scheme is the most important determinant for flexible resource allocation in optical satel...
We first propose a novel multi-domain routing paradigm that transforms the routing problem from heuristic-algorithm-based computation to artificial-intelligence-based data analytics. Numerical results prove that our proposal can achieve excellent routing accuracy, and significant signaling reduction.
We propose joint allocation of computation resource and optical transmission time slices to realize ultralow-latency optical interconnection in time-synchronized HPC systems. Results show that over 80% reduction in buffering time is achieved at high load.
We propose a crosstalk tracing method using deep neural networks for weakly-coupled MDM optical networks. Results show that over 95% tracing accuracy is achieved and the impact of time consistency in data collection is revealed.
Elastic Optical Networks (EONs) represent a new approach for dealing with the enormous traffic demand in core networks as they can offer bandwidth granularities closer to those requested by the user and hence improve spectral utilization. In current literature there is a lack of dynamic strategies for service degradation which is a possible measure...
Modal crosstalk is the main bottleneck in MMF-enabled optical datacenter networks with direct detection. A novel time-slicing-based crosstalk-mitigated MDM scheme is first proposed, then theoretically analyzed and experimentally demonstrated.
We propose an in-service crosstalk monitoring and tracing method using fine-grained monitoring optical time slices for SDM-enabled intra-datacenter and HPC systems. Modal crosstalk below -36.01dB was successfully monitored and traced in an MMF transmission system.
A flexible time-synchronized TWDM-PON (TS-TWDM-PON) architecture is proposed and implemented for low-latency metro-access communication. Results show that a two-order-of-magnitude reduction in end-to-end delay can be achieved with the new TS-TWDM-PON architecture.
Modal crosstalk is the main bottleneck in MMF-enabled optical datacenter networks with direct detection. A novel time-slicing-based crosstalk-mitigated MDM scheme is first proposed, then theoretically analyzed and experimentally demonstrated.
In this paper, we proposed a novel OTSS-assisted optical network architecture for smart-grid communication networks, which has unique requirements for low-latency connections. Illustrative results show that, OTSS can provide extremely better performance in latency and blocking probability than conventional flexi-grid optical networks.
In this paper, we proposed a novel OTSS-assisted optical network architecture for smart-grid communication net- works, which has unique requirements for low-latency connections. Illustrative results show that, OTSS can provide extremely better performance in latency and blocking probability than conventional flexi-grid optical networks
Energy-efficient Time- and Wavelength- Division Multiplexed Passive Optical Network (TWDM-PON) has been intensely investigated. However, conventional schemes aimed at energy efficiency may bring about repeated power-state transitions between sleep mode and active mode, resulting in periodic device-temperature cycling and frequent wavelength reassig...
We propose a Fast-Reconfigurable Optical Interconnect (FROI) architecture enabled by time-synchronized node coordination for high performance computing. Experimental results show that an ultra-low reconfiguration time of 45.6μs can be achieved after traffic pattern changes.
The emergence of new network applications is driving network operators to not only fulfill dynamic bandwidth requirements, but offer various grades of service. Degraded provisioning provides an effective solution to flexibly allocate resources in various dimensions to reduce blocking for differentiated demands when network congestion occurs. In thi...
The emergence of new network applications is driving network operators to not only fulfill dynamic bandwidth requirements, but offer various grades of service. Degraded provisioning provides an effective solution to flexibly allocate resources in various dimensions to reduce blocking for differentiated demands when network congestion occurs. In thi...
The growing popularity of high-speed mobile communications, cloud computing, and the Internet of Things (IoT) has reinforced the tidal traffic phenomenon, which induces spatio-temporal disequilibrium in the network traffic load. The main reason for tidal traffic is the large-scale population migration between business areas during the day and resid...
We present a software-defined unified control architecture for heterogeneous packet-optical networks inter-connection. This architecture supports hybrid packet- and circuit-switched networks employing various switching technologies and can achieve fast and seamless connection establishment.
Tidal traffic caused by the large-scale population migration between workplace during the day and residence at night are becoming a crucial problem for metro network control and management. We introduce an effective tidal traffic dispatching scheme with a novel TIDAL model based on software-defined architecture. Simulation results show that our pro...
We proposed a software-defined unified control architecture for IP over optical transport networks. A successful network experiment of end-to-end dynamic connection establishment is implemented across both IP and OTN layers with the scheme.