Greg J. Regnier

Intel, Santa Clara, California, United States

Are you Greg J. Regnier?

Claim your profile

Publications (14)6.45 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Intel Labs has continued development of the Embedded Transport Acceleration (ETA) software prototype that uses one of the Intel? Xeon™ processors in a multi-processor server as a packet processing engine (PPE) that is closely tied to the server's core CPU and memory complex. We have further developed the prototype to provide support for user-level, asynchronous interface for sockets. The Direct User Socket Interface (DUSI) allows user-level applications to interface directly to the PPE using familiar socket commands and semantics. The prototype runs in an asymmetric multiprocessing mode, in that the PPE does not run as a general computing resource for the host operating system. We describe the prototype software architecture, the DUSI application interface, and detail our measurement and analysis of some micro-benchmarks. In particular, we measure throughput for transactions and end-to-end latency as the key metrics for the analysis.
    Parallel and Distributed Processing Symposium, International. 01/2005; 10:210a.
  • 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), CD-ROM / Abstracts Proceedings, 4-8 April 2005, Denver, CO, USA; 01/2005
  • [Show abstract] [Hide abstract]
    ABSTRACT: To achieve IP-converged cluster deployments, the performance and scalability of iSCSI must approach that of FC SANs. We recognize and quantify that the major overhead of iSCSI comes from TCP/IP processing. Industry has largely responded with TCP offload engines (TOEs) and iSCSI storage adapters. As an alternative, this paper shows a software implementation of iSCSI on generic OSes and processors. The trend towards chip multiprocessing (CMP) and integrated memory controllers (MCH) largely motivated our direction. With CMP, increased processing power is delivered through multiple cores per processor; on-die MCH allows memory bandwidth to scale better with processor speeds. Our approach and analysis shows the effectiveness of partitioning the workload suitable for a CMP system, allowing iSCSI to scale with the increasing processing power and memory bandwidth of servers over time.
    19th International Parallel and Distributed Processing Symposium (IPDPS 2005), CD-ROM / Abstracts Proceedings, 4-8 April 2005, Denver, CO, USA; 01/2005
  • [Show abstract] [Hide abstract]
    ABSTRACT: Supporting multi-gigabit/s of iSCSI over TCP can quickly saturate the processing abilities of a SMP server today. Legacy OS designs and APIs are not designed for the multi-gigabit IO speeds. Most of industry’s efforts had been focused on offloading the extra processing and memory load to the network adapter (NIC). As an alternative, this paper shows a software implementation of iSCSI on generic OSes and processors. We discuss an asymmetric multiprocessing (AMP) architecture, where one of the processors is dedicated to serve as a TCP engine. The original purpose of our prototype was to leverage the flexibility and tools available in generic systems for extensive analyses of iSCSI. As work proceeded, we quickly realized the viability of generic processors to meet iSCSI requirements. Looking ahead to chip-multiprocessing, where multiple cores reside on each processor, understanding partitioning of work and scaling to cores will be important in future server platforms.
    NETWORKING 2005: Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems, 4th International IFIP-TC6 Networking Conference, Waterloo, Canada, May 2-6, 2005, Proceedings; 01/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To meet the increasing networking needs of server workloads, servers are starting to offload packet processing to peripheral devices to achieve TCP/IP acceleration. Researchers at Intel Labs have experimented with alternative solutions that improve the server's ability to process TCP/IP packets efficiently and at very high rates.
    Computer 12/2004; · 1.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a technology strategy for enabling applications to scale to next-generation levels of I/O scalability and communication performance on industry standard platforms. The strategy combines efficient packet processing and scalable I/O concurrency, potentially enabling Ethernet and TCP to approach the latency and throughput performance offered by today's System Area Networks. We target the performance of communicationcentric applications, initially using a web server as the application for concept validation.
    07/2004;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a technology strategy for enabling applications to scale to next-generation levels of I/O scalability and communication performance on industry standard platforms. The strategy combines efficient packet processing and scalable I/O concurrency, potentially enabling Ethernet and TCP to approach the latency and throughput performance offered by today's System Area Networks. We target the performance of communicationcentric applications, initially using a web server as the application for concept validation.
    03/2004;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Server-based networks have well-documented performance limitations. These limitations outline a major goal of Intel's embedded transport acceleration (ETA) project, the ability to deliver high-performance server communication and I/O over standard Ethernet and transmission control protocol/Internet protocol (TCP/IP) networks. By developing this capability, Intel hopes to take advantage of the large knowledge base and ubiquity of these standard technologies. With the advent of 10 gigabit Ethernet, these standards promise to provide the bandwidth required of the most demanding server applications. We use the term packet processing engine (PPE) as a generic term for the computing and memory resources necessary for communication-centric processing. Such PPEs have certain desirable attributes; the ETA project focuses on developing PPEs with such attributes, which include scalability, extensibility, and programmability. General-purpose processors, such as the Intel Xeon in our prototype, are extensible and programmable by definition. Our results show that software partitioning can significantly increase the overall communication performance of a standard multiprocessor server. Specifically, partitioning the packet processing onto a dedicated set of compute resources allows for optimizations that are otherwise impossible when time sharing the same compute resources with the operating system and applications.
    IEEE Micro 02/2004; 24(1):24- 31. · 2.39 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The ETA project at Intel Research and Development has developed a software prototype that uses one of the Intel® XeonTM processors in a multi-processor server as a packet processing engine. The prototype is used as a vehicle for empirical measurement and analysis of a highly programmable packet processing engine that is closely tied to the server's core CPU and memory complex. The usage model for the prototype is the acceleration of server TCP/IP networking. The ETA prototype runs in an asymmetric multiprocessing mode, in that the packet processing engine does not run as a general computing resource for the host operating system. We show an effective method of interfacing the packet processing engine to the host processors using efficient asynchronous queuing mechanisms. This paper describes the ETA software architecture, the ETA prototype, and details the measurement and analysis that has been performed to date. Test results include running the packet processing engine in single- threaded mode, as well as in multi-threaded mode using Intel's Hyper-Threading Technology (HT). Performance data gathered for network throughput and host CPU utilization show a significant improvement when compared to the standard TCP/IP networking stack.
    IEEE Micro. 01/2004; 24:24-31.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ETA (embedded transport acceleration) project at Intel Research and Development has developed a software prototype that uses one of the Intel® Xeon™ processors in a multi-processor server as a packet processing engine. The prototype is used as a vehicle for empirical measurement and analysis of a highly programmable packet processing engine that is closely tied to the server's core CPU and memory complex. The usage model for the prototype is the acceleration of server TCP/IP networking. The ETA prototype runs in an asymmetric multiprocessing mode, in that the packet processing engine does not run as a general computing resource for the host operating system. We show an effective method of interfacing the packet processing engine to the host processors using efficient asynchronous queuing mechanisms. This paper describes the ETA software architecture, the ETA prototype, and details the measurement and analysis that has been performed to date. Test results include running the packet processing engine in single-threaded mode, as well as in multi-threaded mode using Intel's hyper-threading technology (HT). Performance data gathered for network throughput and host CPU utilization show a significant improvement when compared to the standard TCP/IP networking stack.
    High Performance Interconnects, 2003. Proceedings. 11th Symposium on; 09/2003
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Detailed measurements and analyses for the Linux-2.4 TCP stack on current adapters and processors are presented. We describe the impact of CPU scaling and memory bus loading on TCP performance. As CPU speeds outstrip I/O and memory speeds, many generally accepted notions of TCP performance begin to unravel. In-depth examinations and explanations of previously held TCP performance truths are provided, and we expose cases where these assumptions and rules of thumb no longer hold in modern-day implementations. We conclude that unless major architectural changes are adopted, we would be hard-pressed to continue relying on the 1 GHz/1 Gbps rule of thumb.
    Performance Analysis of Systems and Software, 2003. ISPASS. 2003 IEEE International Symposium on; 04/2003
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The boundary between the network edge and the front-end servers of the data center is blurring. Appliance vendors are flooding the market with new capabilities, while switch/router vendors scramble to add these services to their traditional transport services. The result of this competition is a set of ad-hoc technologies and capabilities to provide services at the network edge. This paper describes the Comm Services Platform (CSP); a system architecture for this new 'communication services tier' of the data center. CSP enumerates a set of architectural components to provide scalable communication services built from standard building blocks that utilize emerging server, I/O and network technologies. The building blocks of CSP include a System Area Network, the Virtual Interface Architecture, programmable network processors, and standard high-density servers.
    03/2001;
  • Source
    3rd USENIX Symposium on Internet Technologies and Systems, USITS 2001, March 26-28, 2001, San Francisco, California, USA; 01/2001
  • [Show abstract] [Hide abstract]
    ABSTRACT: This protected, zero-copy, user-level network interface architecture reduces the system overhead for sending and receiving messages between high-performance CPU/memory subsystems and networks to less than 10 microseconds when integrated in silicon
    IEEE Micro 04/1998; · 2.39 Impact Factor