David E. Taylor

Washington University in St. Louis, San Luis, Missouri, United States

Are you David E. Taylor?

Claim your profile

Publications (19)14.36 Total impact

  • Source
    David E. Taylor · Jonathan S. Turner
    [Show abstract] [Hide abstract]
    ABSTRACT: Packet classification is an enabling technology for next generation network services and often a performance bottleneck in high-performance routers. The performance and capacity of many classification algorithms and devices, including TCAMs, depend upon properties of filter sets and query patterns. Despite the pressing need, no standard performance evaluation tools or filter sets are publicly available. In response to this problem, we present ClassBench , a suite of tools for benchmarking packet classification algorithms and devices. ClassBench includes a filter set generator that produces synthetic filter sets that accurately model the characteristics of real filter sets. Along with varying the size of the filter sets, we provide high-level control over the composition of the filters in the resulting filter set. The tool suite also includes a trace generator that produces a sequence of packet headers to exercise packet classification algorithms with respect to a given filter set. Along with specifying the relative size of the trace, we provide a simple mechanism for controlling locality of reference. While we have already found ClassBench to be very useful in our own research, we seek to eliminate the significant access barriers to realistic test vectors for researchers and initiate a broader discussion to guide the refinement of the tools and codification of a formal benchmarking methodology. (The ClassBench tools are publicly available at the following site: http://www.arl.wustl.edu/~det3/ClassBench/.)
    Full-text · Article · Jul 2007 · IEEE/ACM Transactions on Networking
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce the first algorithm that we are aware of to employ Bloom filters for longest prefix matching (LPM). The algorithm performs parallel queries on Bloom filters, an efficient data structure for membership queries, in order to determine address prefix membership in sets of prefixes sorted by prefix length. We show that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches. The key feature of our technique is that the performance, as determined by the number of dependent memory accesses per lookup, can be held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table. Our approach is equally attractive for Internet Protocol Version 6 (IPv6) which uses 128-bit destination addresses, four times longer than IPv4. We present a basic version of our approach along with optimizations leveraging previous advances in LPM algorithms. We also report results of performance simulations of our system using snapshots of IPv4 BGP tables and extend the results to IPv6. Using less than 2 Mb of embedded RAM and a commodity SRAM device, our technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.
    Preview · Article · May 2006 · IEEE/ACM Transactions on Networking
  • Source
    David E. Taylor · Andreas Herkersdorf · A. Doring · Gero Dittmann
    [Show abstract] [Hide abstract]
    ABSTRACT: Robust Header Compression (ROHC) provides for more efficient use of radio links for wireless communication in a packet switched network. Due to its potential advantages in the wireless access area and the proliferation of network processors in access infrastructure, there exists a need to understand the resource requirements and architectural implications of implementing ROHC in this environment. We present an analysis of the primary functional blocks of ROHC and extract the architectural implications on next-generation network processor design for wireless access. The discussion focuses on memory space and bandwidth dimensioning as well as processing resource budgets. We conclude with an examination of resource consumption and potential performance gains achievable by offloading computationally intensive ROHC functions to application specific hardware assists. We explore the design tradeoffs for hardware assists in the form of reconfigurable hardware, Application-Specific Instruction-set Processors (ASIPs), and Application-Specific Integrated Circuits (ASICs)
    Preview · Article · Sep 2005 · IEEE/ACM Transactions on Networking
  • Source
    David E. Taylor · Jonathan S. Turner
    [Show abstract] [Hide abstract]
    ABSTRACT: Packet classification is the enabling technology for next generation network services and often the primary bottleneck in high-performance routers. Due to the importance and complexity of the problem, a myriad of algorithms and resulting implementations exist. The performance and capacity of many algorithms and classification devices, including TCAMs, depend upon properties of the filter set and query patterns. Unlike microprocessors in the field of computer architecture, there are no standard performance evaluation tools or techniques available to evaluate packet classification algorithms and products. Network service providers are reluctant to distribute copies of real filter databases for security and confidentiality reasons, hence realistic test vectors are a scarce commodity. The small subset of the research community who obtain real databases either limit performance evaluation to the small sample space or employ ad hoc methods of modifying those databases. We present a tool for creating synthetic filter databases that retain characteristics of a seed database and provide systematic mechanisms for varying the number and composition of the filters. We propose a benchmarking methodology based on this tool that provides a mechanism for evaluating packet classification performance on a uniform scale. We seek to initiate a broader discussion within the community that will result in a standard packet classification benchmark.
    Full-text · Article · Sep 2004
  • Source
    David E. Taylor
    [Show abstract] [Hide abstract]
    ABSTRACT: Packet classification is an enabling function for a variety of Internet applications including Quality of Service, security, monitoring, and multimedia communications. In order to classify a packet as belonging to a particular flow or set of flows, network nodes must perform a search over a set of filters using multiple fields of the packet as the search key. In general, there have been two major threads of research addressing packet classification: algorithmic and architectural. A few pioneering groups of researchers posed the problem, provided complexity bounds, and offered a collection of algorithmic solutions. Subsequently, the design space has been vigorously explored by many offering new algorithms and improvements upon existing algorithms. Given the inability of early algorithms to meet performance constraints imposed by high speed links, researchers in industry and academia devised architectural solutions to the problem. This thread of research produced the most widely-used packet classification device technology, Ternary Content Addressable Memory (TCAM). New architectural research combines intelligent algorithms and novel architectures to eliminate many of the unfavorable characteristics of current TCAMs. We observe that the community appears to be converging on a combined algorithmic and architectural approach to the problem. Using a taxonomy based on the high-level approach to the problem and a minimal set of running examples, we provide a survey of the seminal and recent solutions to the problem. It is our hope to foster a deeper understanding of the various packet classification techniques while providing a useful framework for discerning relationships and distinctions.
    Preview · Article · Sep 2004 · ACM Computing Surveys
  • Source
    David E. Taylor · Jonathan S. Turner
    [Show abstract] [Hide abstract]
    ABSTRACT: A wide variety of packet classification algorithms and devices exist in the research literature and commercial market. The existing solutions exploit various design tradeoffs to provide high search rates, power and space efficiency, fast incremental up- dates, and the ability to scale to large numbers of filters. There remains a need for techniques that achieve a favorable balance among these tradeoffs and scale to support classification on addi- tional fields beyond the standard 5-tuple. We introduce Distributed Crossproducting of Field Labels (DCFL), a novel combination of new and existing packet classification techniques that leverages key observations of the structure of real filter sets and takes ad- vantage of the capabilities of modern hardware technology. Using a collection of real and synthetic filter sets, we provide analyses of DCFL performance and resource requirements on filter sets of various sizes and compositions. An optimized implementation of DCFL can provide over 100 million searches per second and stor- age for over 200 thousand filters in a current generation FPGA or ASIC without the need for external memory devices.
    Full-text · Article · Jan 2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: IP address lookup is a central processing function of Internet routers. While a wide range of solutions to this problem have been devised, very few, simultaneously achieve high lookup rates, good update performance, high memory efficiency and low hardware cost. High performance solutions using Content Addressable Memory (CAM) devices are a popular, but high cost solution, particularly when applied to large databases. We present an efficient hardware implementation of a previously unpublished IP address lookup architecture, invented by Eatherton and Dittia. Our experimental implementation uses a single commodity SRAM chip and a less than 10% of the logic resources of a commercial configurable logic device, operating at 100 MHz. With these quite modest resources, it can perform over 9 million lookups per second, while simultaneously processing thousands of updates per second, on databases with over 100,000 entries. The lookup structure requires only about 10 bytes per address prefix, less than half that required by other methods. The architecture allows performance to be scaled up by using parallel Fast IP Lookup (FIPL) engines, which interleave accesses to a common memory interface. This architecture allows performance to scale up directly with available memory bandwidth. We describe the tree bitmap algorithm, our implementation of it in a dynamically extensible gigabit router being developed at Washington University, and the results of performance experiments designed to assess its performance under realistic operating conditions.
    Full-text · Article · Oct 2003 · IEEE Journal on Selected Areas in Communications
  • Edson L. Horta · John W. Lockwood · David E. Taylor · David Parlour
    [Show abstract] [Hide abstract]
    ABSTRACT: Tools and a design methodology have been developed to support partial run-time reconfiguration of FPGA logic on the Field Programmable Port Extender. High-speed Internet packet processing circuits on this platform are implemented as Dynamic Hardware Plugin (DHP) modules that fit within a specific region of an FPGA device. The PARBIT tool has been developed to transform and restructure bitfiles created by standard computer aided design tools into partial bitsteams that program DHPs. The methodology allows the platform to hot-swap application-specific DHP modules without disturbing the operation of the rest of the system.
    No preview · Article · Jun 2002
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Continuing growth in optical link speeds places increasing demands on the performance of Internet routers, while deployment of embedded and distributed network services imposes new demands for flexibility and programmability. IP address lookup has become a significant performance bottleneck for the highest performance routers. Amid the vast array of academic and commercial solutions to the problem, few achieve a favorable balance of performance, efficiency, and cost. New commercial products utilize Content Addressable Memory (CAM) devices to achieve high lookup speeds at an exhorbitantly high hardware cost with limited flexibility. In contrast, this paper describes an efficient, scalable lookup engine design, able to achieve highperformance with the use of a small portion of a reconfigurable logic device and a commodity Random Access Memory (RAM) device. The Fast Internet Protocol Lookup (FIPL) engine is an implementation of Eatherton and Dittia's previously unpublished Tree Bitmap algorithm [1] targeted to an open-platform research router. FIPL can be scaled to achieve guaranteed worst-case performance of over 9 million lookups per second with a single SRAM operating at the fairly modest clock speed of 100 MHz. Experimental evaluation of FIPL throughput, latency, and update performance is provided using a sample routing table from Mae West [2]. I.
    Full-text · Article · Apr 2002 · Proceedings - IEEE INFOCOM
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the dynamic hardware plugins (DHP) architecture for implementing multiple networking applications in hardware at programmable routers. By enabling multiple applications to be dynamically loaded into a single hardware device, the DHP architecture provides a scalable mechanism for implementing high-performance programmable routers. The DHP architecture is presented within the context of a programmable router architecture which processes flows in both software and hardware. Implementation options are described as well as the prototype testbed at Washington University in Saint Louis which utilizes the partial reconfiguration capability of modern field programmable gate arrays.
    Full-text · Article · Feb 2002 · Computer Networks
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tools and a design methodology have been developed to support partial run-time reconfiguration of FPGA logic on the Field Programmable Port Extender. High-speed Internet packet processing circuits on this platform are implemented as Dynamic Hardware Plugin (DHP) modules that fit within a specific region of an FPGA device. The PARBIT tool has been developed to transform and restructure bitfiles created by standard computer aided design tools into partial bitsteams that program DHPs. The methodology allows the platform to hot-swap application-specific DHP modules without disturbing the operation of the rest of the system.
    Full-text · Conference Paper · Jan 2002
  • Source

    Full-text · Conference Paper · Jan 2002
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A prototype platform has been developed that allows processing of packets at the edge of a multi-gigabit-per-second network switch. This system, the Field Programmable Port Extender (FPX), enables packet processing functions to be implemented as modular components in reprogrammable hardware. All logic on the on the FPX is implemented in two Field Programmable Gate Arrays (FPGAs). Packet processing functions in the system are implemented as dynamicallyloadable modules. Core functionality of the FPX is implemented on an FPGA called the Networking Interface Device (NID). The NID contains the logic to transmit and receive packets over a network, dynamically reprogram hardware modules, and route individual tra#c flows. A full, non-blocking, switch is implemented on the NID to route packets between the networking interfaces and the modular components. Modular components of the FPX are implemented on a second FPGA called the Reprogrammable Application Device (RAD). Modules are loaded onto t...
    Full-text · Conference Paper · Jan 2001
  • John W. Lockwood · Jon S. Turner · David E. Taylor
    [Show abstract] [Hide abstract]
    ABSTRACT: "!$#&%('*)+'-,./.0,1! 324,5*768'9'-,":;=<>(%?246@;-AB,'*C1D" E+ F ;9"#/5*)HG'*)I$#8JK,; 6@;-AB,'*C1D" 28200-35950 F 59U EV+0,ER#7,#I,EVO""#CW F $E+/ E3,BXV +X! :=;9OY,!$,1V! ZEN5[?)'*S ; EN5[?)'*S 28940-338 F ! bROY,! ! Y#c5*XZ "!$#UdG'*)+'-,./.0,1! %)'95CePf5*"EV#N'g<>(%PhiATbM ;=1DN$E+j1 F !5k5*)l, F +./"E5m5*X no,;9XV EV+5*)EqprEV IN'*;95:s2t +,1V53u[8 5*O-X`<vnqp42iuVAZ[85*X '*"G'*)+'-,./.0,1!$t!$)+ O] (%Phw./)# F ! ";@'*";9$#x,545*XxY#+H)J5*XV_nqpi2iu=; 35950 EV+CJv,1'* O]7%X:;9 OY,! !:b5*XV/./)# F ! / ;x E;9N'95*Y#y1DN5d[z"NE ,EZ)G5* OY,!V! EVMOY,'-#B,EV#x5*XVQnqpi2iux+ +,15?; EV#x5*XVQnqpi2iu GV!$,EV]@{zXxXV,'-#[z,'* F ;9"#=Jv)'r5*XV ;tG'*)|"ON54,! ! )[8;tGD)'95*; )J5*XVz; 5*; 29060-249 !$,5*Y#_[85*XZ,EB(%Phl5*)@)GDT'-,5*8,5('-,5*"; F Gc5*)3}] ~32t +,...
    No preview · Article · Apr 2000
  • John W. Lockwood · Jonathan S. Turner · David E. Taylor
    [Show abstract] [Hide abstract]
    ABSTRACT: Field Programmable Gate Arrays (FPGAs) are being used to provide fast Internet Protocol (IP) packet routing and advanced queuing in a highly scalable network switch. A new module, called the Field-programmable Port Extender (FPX), is being built to augment the Washington University Gigabit Switch (WUGS) with reprogrammable logic.FPX modules reside at the edge of the WUGS switching fabric. Physically, the module is inserted between an optical line card and the WUGS gigabit switch back-plane. The hardware used for this project allows ports of the switch populated with an FPX to operate at rates up to 2.4 Gigabits/second. The aggregate throughput of the system scales with the number of switch ports.Logic on the FPX module is implemented with two FPGA devices. The first device is used to interface between the switch and the line card, while the second is used to prototype new networking functions and protocols. The logic on the second FPGA can be reprogrammed dynamically via control cells sent over the network.The flexibility of the FPX has made the card of interest for several networking applications. This year, fifty FPX hardware modules will be fabricated and distributed to researchers at eight universities around the country who are interested in experimenting with reprogrammable networks and per-flow queuing mechanisms. The FPX hardware will first be used to implement fast IP lookup algorithms and distributed input queueing.
    No preview · Conference Paper · Jan 2000
  • William D. Richard · David E. Taylor · David M. Zar
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the senior computer engineering capstone design course at Washington University in St. Louis. As part of this course, three-student teams develop a complete 8-bit microprocessor using a hardware description language (VHDL) and implement their designs in a small FPGA. Programmed FPGAs are “booted” at the end of the course and tested for accuracy. Students also write an assembler or a simple calculator for their microprocessor. This paper describes the microprocessor architecture and tool flow used in the course
    No preview · Article · Dec 1999 · IEEE Transactions on Education
  • Source
    David E. Taylor · Edward W. Spitznagel
    [Show abstract] [Hide abstract]
    ABSTRACT: Packet switched networks such as the Internet require packet classification at every hop in order to ap- ply services and security policies to traffic flows. The relentless increase in link speeds and traffic volume imposes astringent constraints on packet classification solutions. Ternary Content Addressable Memory (TCAM) devices are favored by most network component and equipment vendors due to the fast and de- terministic lookup performance afforded by their use of massive parallelism. While able to keep up with high speed links, TCAMs suffer from exorbitant power consumption, poor scalability to longer search keys and larger filter sets, and inefficient support of multiple matches. The research community has responded with algorithms that seek to meet the lookup rate constraint with greater efficiency through the use of com- modity Random Access Memory (RAM) technology. The most promising algorithms efficiently achieve high lookup rates by leveraging the statistical structure of real filter sets. Due to their dependence on filter set characteristics, it is difficult to provision processing and memory resources for implementations that support a wide variety of filter sets. We show how several algorithmic advances may be leveraged to im- prove the efficiency, scalability, incremental update and multiple match performance of CAM-based packet classification techniques without degrading the lookup performance. Our approach, Label Encoded Content Addressable Memory (LECAM), represents a hybrid technique that utilizes decomposition, label encoding, and a novel Content Addressable Memory (CAM) architecture. By reducing the number of implementation parameters, LECAM provides a vehicle to carry several of the recent algorithmic advances into practice. We provide a thorough overview of CAM technologies and packet classification algorithms, along with a detailed discussion of the scaling issues that arise with longer search keys and larger filter sets. We also provide a comparative analysis of LECAM and standard TCAM using a collection of real and synthetic filter sets of various sizes and compositions.
    Preview · Article ·
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the architecture of the Smart Port Card (SPC) designed for use with the Washing- ton University Gigabit Switch. The SPC uses an embedded Intel Pentium processor running open-source NetBSD to support network management and active networking applications. The SPC physically connects between a switch port and a normal link adapter, allowing cell streams to be processed as they enter or leave the switch. In addition to the hardware architecture, this paper describes current and future applications for the SPC.
    Preview · Article ·
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This document describes the design and functionality of the hardware components implemented in the Field-programmable Port eXtender (FPX) to support the Washington University Network Services Platform (NSP). This includes support for the Multi-Service Router (MSR) and Extreme Networking projects. The functionality of each component is described along with supporting top-level entity diagrams, block dia-
    Full-text · Article ·