Lei Shi

Tsinghua University, Peping, Beijing, China

Are you Lei Shi?

Claim your profile

Publications (15)3.35 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Multipath Switching systems (MPS) are intensely used in state-of-the-art core routers to provide terabit or even petabit switching capacity. One of the most intractable issues in designing MPS is how to load balance traffic across its multiple paths while not disturbing the intraflow packet orders. Previous packet-based solutions either suffer from delay penalties or lead to O(N^2 ) hardware complexity, hence do not scale. Flow-based hashing algorithms also perform badly due to the heavy-tailed flow-size distribution. In this paper, we develop a novel scheme, namely, Flow Slice (FS) that cuts off each flow into flow slices at every intraflow interval larger than a slicing threshold and balances the load on a finer granularity. Based on the studies of tens of real Internet traces, we show that setting a slicing threshold of 1-4 {\rm ms}, the FS scheme achieves comparative load-balancing performance to the optimal one. It also limits the probability of out-of-order packets to a negligible level (10^{ - 6}) on three popular MPSes at the cost of little hardware complexity and an internal speedup up to two. These results are proven by theoretical analyses and also validated through trace-driven prototype simulations.
    IEEE Transactions on Computers 03/2012; 61(3):350-365. DOI:10.1109/TC.2010.279 · 1.66 Impact Factor
  • Lei Shi · Wenjie Li · Bin Liu
  • Source
    Changhua Sun · Bin Liu · Lei Shi
    [Show abstract] [Hide abstract]
    ABSTRACT: DNS amplification attacks utilize IP address spoofing and large numbers of open recursive DNS servers to perform the bandwidth consumption attack. During an attack, it ceaselessly fabricates DNS queries to the exploited open recursive DNS servers, and all the responses, often with larger size than the query messages, are reflected to the single victim due to the source IP address spoofing. While it is difficult to defend against this attack from the root causes by eliminating the open recursive DNS servers and IP spoofing for the whole Internet, in this paper, we take a different methodology to defend against it at the leaf router of victim's ISP or organization. We propose an efficient and low-cost hardware approach to first detect the DNS amplification attack accurately and responsively. Once the attack is confirmed, our approach is then activated to filter out all the illegitimate DNS responses by using a two-Bloom filter solution. We demonstrate that the memory cost of our approach is feasible for the hardware implementation even up to the OC-768 link. Through trace-driven simulations, it is shown that our approach is effective in both the detecting and filtering phases.
    Global Telecommunications Conference, 2008. IEEE GLOBECOM 2008. IEEE; 01/2009
  • Yue Zhang · Bin Liu · Lei Shi · Jingnan Yao · L. Bhuyan
    [Show abstract] [Hide abstract]
    ABSTRACT: Efficiency and effectiveness are always the emphases of a scheduler, for both link and processor scheduling. Well-known scheduling algorithms such as surplus round robin (SRR) and elastic round robin (ERR) suffer from two fold shortcomings: 1) additional pre-processing queuing delay and post-processing resequencing delay are incurred due to the lack of short-term load-balancing; 2) bursty scheduling is caused due to blind preservation of scheduling history under non-backlogged traffic. In this paper, we propose a quantum-adaptive scheduling (QAS) algorithm, which: 1) synchronizes all the quanta in a fine-grained manner and, 2) adjusts the quanta intelligently based on processor utilization. We theoretically prove that the queuing fairness bound (QFB) for QAS is one third tighter than SRR and ERR. This result approaches the optimal value as obtained in shortest queue first (SQF) algorithm, while still maintaining O(1) complexity. Trace-driven simulations show that QAS reduces average packet delay by 18%~24% while cutting down the resequencing buffer size by more than 40% compared to SRR and ERR.
    Distributed Computing Systems, 2008. ICDCS '08. The 28th International Conference on; 07/2008
  • Source
    Lei Shi · Changbin Liu · Bin Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: It is well known that Next-Generation Network (NGN) will inevitably carry triple-play services (i.e. voice, video and data) simultaneously. However, the traditional strict-priority based scheduling algorithm intensively used in current Internet cannot maximize the overall network utility for NGN, instead brings significant global welfare loss. In this paper, we study how to achieve Network Utility Maximization (NUM) in NGN running triple-play services. By investigating the characteristics of most of its traffic classes, we explicitly present their utilities as the function of allocated bandwidth. We further formulate the NUM objective as a nonlinear programming problem with both inequality and equality constraints. A solution using Lagrange Multiplier is given on the simplified problem with only equality constraints, which indicates the major distinction from strict-priority based scheduling, the existence of a turning point for IPTV users. Simulations are also carried out using LINGO on the original complicated problem. Several useful results are presented on the new features of the NUM-based scheduling. We also discuss the methods to alleviate the impact of turning point and the consequent unstable bandwidth allocation.
    Computer Communications 06/2008; 31(10-31):2257-2269. DOI:10.1016/j.comcom.2008.02.016 · 1.70 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-Path Switching systems (MPS) are intensively used in the state-of-the-art core routers. One of the most intractable issues is how to load-balance traffic across its multiple paths while not disturbing the intra-flow packet orders. In this paper, based on the studies of tens of real Internet traces, we develop a novel scheme, namely Flow-Slice (FS), which cuts off each flow into flow-slices at every intra-flow interval larger than a slicing threshold set to 1ms 4ms and balances the load on the finer granularity. Through theoretical analyses and comprehensive trace-driven simulations, we show that FS achieves impressive load-balancing performance with little hardware cost while limiting the packet out-of-order chances to a negligible level (below 10 -6).
    Proceedings of the 2007 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, ANCS 2007, Orlando, Florida, USA, December 3-4, 2007; 12/2007
  • Source
    Changhua Sun · Lei Shi · Chengchen Hu · Bin Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Short flow first scheduling (SFF) strategy is effective in obtaining more stringent performance bounds for short flows in Internet. However, previous strict SFF approaches invested the short flows with excessive preference leading to the starvation of other long flows. Moreover, these SFFs are hard to be deployed due to either extreme complexity or the modification of TCP protocol. Inspired by the fairness and practicality of deficit round robin (DRR), we proposed a novel scheduling mechanism, namely deficit round robin with short flow first (DRR-SFF), which improves the performance of short flows with limited penalizing long flows. DRR-SFF uses weighed DRR to schedule short and long flows respectively and treats long flows more fairly. Through trace-driven simulation, we show that the mean transmission time and loss rate of short flows under DRR-SFF are significantly reduced compared with FIFO scheduling using DropTail. Meanwhile, the performance of long flows under DRR-SFF does not degrade much, retaining to that of FIFO scheduling. Our results demonstrate DRR-SFF is superior to strict SFF approaches, as the latter drives long flows into starvation in our simulations. Moreover, DRR-SFF inherits the O(1) computation complexity of DRR, which makes it easy to be deployed in edge routers.
    Networking and Services, 2007. ICNS. Third International Conference on; 07/2007
  • Source
    Changbin Liu · Lei Shi · Bin Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: It is expected that, in near future, the multi-class traffic previously in public switched telephone network (PSTN), cable television network and IP network will be multiplexed at backbone and carried by the converged next-generation network (NGN). A pragmatic challenge in facilitating NGN is how to schedule traffic and allocate bandwidth among the triple-play services, namely voice (VoIP), video (IPTV) and data. Different from traditional strict-priority based scheduling intensively used in industry, in this paper, we try to discuss this issue from the objective of network utility maximization (NUM). We first investigate the characteristics of most existing traffic classes and explicitly formulate their utilities to be the function of occupied bandwidth. After that, a novel scheduling scheme to achieve NUM is derived using Lagrange method with KKT conditions. Numerical results under two network scenarios are calculated. Both of them reveal the unique nature of this scheduling, compared with strict-priority scheduling, still highest priority is provided for VoIP traffic, however, no strict priority should be given to IPTV traffic since it will conflict with NUM objective. We hope our results will shed lights on the evolvement towards the converged network
    Universal Multiservice Networks, 2007. ECUMN '07. Fourth European Conference on; 03/2007
  • Lei Shi · Gao Xia · Bin Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Flow-mapping parallel packet switch (FM-PPS) is the class of parallel packet switch adopting flow-level load-balancing algorithms. It dispatches packets of a micro-flow into an unchanged parallel switch, thus natively guarantees intra-flow packet orders. Due to the heavy tail of flow size distribution, it is concerned that FM-PPS may suffer from unpredictable performance which prevents it from real deployments. Motivated to clarify this issue, in this paper, we present an effective analytical model on FM-PPS and carry out intensive performance analysis. We find that under current Internet traffic patterns, both statistical packet delay and backlog bounds can be guaranteed if only several stability conditions are met. We further validate that a FM-PPS with OC-768c line rate is able to provide such guarantees under state-of-the-art RAM technology. A practical flow-mapping algorithm, namely constrained output round robin (CORR), is also proposed, which is designed to conform to these stability conditions, hence holds the delay and backlog bound.
    Proceedings of the 26th IEEE International Performance Computing and Communications Conference, IPCCC 2007, April 11-13, 2007, New Orleans, Louisiana, USA; 01/2007
  • Source
    Changhua Sun · Jindou Fan · Lei Shi · Bin Liu
  • Lei Shi · Yue Zhang · Jianming Yu · Bo Xu · Bin Liu · Jun Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation high-end network processors (NP) must address demands from both diversified applications and ever-increasing traffic pressure. One major challenge is to design an extraordinary scalable architecture. In this paper, it is argued that such an objective can only be sufficed by introducing highly paralleled structure, namely the paralleled processing-engine cluster (PPC). We demonstrate this point from the trade-off among aspects such as performance, programmability and flexibility. However, PPC natively suffers from several critical issues on load-balancing, intra-flow packet ordering and memory contention. After investigating several existing approaches, we present novel solutions for each issue according to the balance between performance and coast. Through intensive analysis and comprehensive simulations, it is shown that the shortest queue first scheduling with class-based prediction (SQF-C) performs nearly optimally, while the hardware based per-flow ordering mechanism resolves packet out-of-order independently with the load-balancing issue, inducting little throughput degradation. Implementing the unified solution, it is capable to design a PPC supporting up to OC-768c line rate. Real implementation is also carried out in our THNPU-1 prototype to verify the conclusions.
    INFOCOM 2007. 26th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 6-12 May 2007, Anchorage, Alaska, USA; 01/2007
  • Source
    Lei Shi · Bin Liu · Wenjie Li · Beibei Wu · Yunhao Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Parallel Packet Switch (PPS) is used intensively in today's terabit router to construct the switching fabric. Basic PPS equally deals with all of the traffic in order to achieve uniform load-balancing and high throughput, but it fails to support differentiated QoS. With the recent blooming of delay- sensitive Internet traffic, such as the peer-to-peer live streaming and IPTV, differentiated QoS is becoming an urgent demand. In this paper, we propose a novel and practical framework, the Differentiated Service Parallel Packet Switch (DS-PPS), which supports three fundamental QoS features: guaranteed-delay (GD), guaranteed-bandwidth (GB) and best-effort (BE). By adaptively adjusting the number of switching planes offered to each QoS class, DS-PPS precisely controls the delay bounds of GD traffic and the drop precedence of GB traffic. We evaluate DS-PPS by extensive theoretical analyses and comprehensive simulations. Experimental results on a prototype implementation of the framework show that DS-PPS outperforms the basic PPS in three main aspects. First, the average delay of TCP short packet under full load is reduced by more than 94%. Second, the average delay of real-time traffic under full load is reduced by more than 82%. And third, the GB traffic of low drop precedence is guaranteed of nearly three times the throughput of high drop precedence at the hotspots. Significantly, our proposed DS-PPS framework is universal and scalable to support various kinds of emerging QoS-sensitive applications in multi-service terabit routers without any extra overhead.
    INFOCOM 2006. 25th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 23-29 April 2006, Barcelona, Catalunya, Spain; 04/2006
  • Lei Shi · Wenjie Li · Bin Liu · Xiaojun Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Our work is motivated by the desire to design a switching fabric with very large aggregate capacity. In this paper, we consider a parallel switching architecture consisting of multiple planes, with each plane working independently. Different from previous research into parallel packet switches, this new architecture tries to perform load balancing at the variable-length packet level. Furthermore, to guarantee the performance of end-to-end flow and to provide differentiated QoS using a traffic manager which works slower than the line rate, we designed a packet load balancing algorithm in the flow mapping manner. We show that such a flow mapping is practical, and can be implemented using state-of-the-art commercial SRAM.
    High Performance Switching and Routing, 2005. HPSR. 2005 Workshop on; 06/2005
  • Wenjie Li · Bin Liu · Lei Shi · Yang Xu · Dapeng Wu
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent Internet traffic measurements show that 60% of the total packets are short packets, which include TCP acknowledgment and control segments. These short packets make a great impact on the performance of TCP. Unfortunately, short packets suffer from large delay due to serving long data packets in switches running in the packet mode, i.e. a packet is switched in its entirety. To optimize TCP performance, we apply a cross-layer approach to the design of switching architectures and scheduling algorithms. Specifically, we propose a preemptive packet-mode scheduling architecture and an algorithm called preemptive short packets first (P-SPF). Analysis and simulation results demonstrate that compared to existing packet-mode schedulers, P-SPF significantly reduces the waiting time for short packets while achieving a high overall throughput when the traffic load is heavy. Moreover, with a relatively low speedup, P-SPF performs better than existing packet-mode schedulers under any traffic load.
    Quality of Service - IWQoS 2005: 13th International Workshop, IWQoS 2005, Passau, Germany, June 21-23, 2005, Proceedings; 01/2005
  • Wenjie Li · Lei Shi · Yang Xu · Bin Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Variable-size IP packets are generally segmented into fixed-size cells for switching and scheduling in scalable input queueing switches. While switch bandwidth loss occurs when packets’ sizes are not integral times of the cell size, and the speedup of at least two is required to achieve full line rate. This paper proposes a framing approach called Bit Map Packet Framing (BMPF) to merge and segment IP packets efficiently. In BMPF, the partially filled cell can carry some bytes from the following packet. Thus switch bandwidth loss is avoided and the required speedup is greatly lowered to 1.17 . BMPF is superior to other conventional framing methods, such as PPP, HDLC and COBS. Furthermore, BMPF can be also deployed to merge IP packets in optical packet switches.
    Networking and Mobile Computing, Third International Conference, ICCNMC 2005, Zhangjiajie, China, August 2-4, 2005, Proceedings; 01/2005