• Home
  • IBM
  • Hardware Development
  • Dieter Wendel
Dieter Wendel

Dieter Wendel
  • IBM

About

58
Publications
5,545
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,434
Citations
Current institution
IBM

Publications

Publications (58)
Article
The two chips at the heart of the IBM z13™ system include a processor chip (referred to as the CP or Central Processor chip) and an L4 (Level 4) cache chip (referred to as the SC or System Controller chip), each 678 mm 2 in area. The CP and SC chips were implemented with approximately 4 billion (4 × 109) and 7.1 billion transistors, respectively, i...
Conference Paper
The next-generation System z design introduces a new microprocessor chip (CP) and a system controller chip (SC) aimed at providing a substantial boost to maximum system capacity and performance compared to the previous zEC12 design in 32nm [1,2]. As shown in the die photo, the CP chip includes 8 high-frequency processor cores, 64MB of eDRAM L3 cach...
Patent
A voltage selection mechanism is provided for switching between multiple voltages without causing a direct current (DC) that may further stress storage elements due to excessive power consumption and electro-migration effects. The voltage selection mechanism comprises cross-coupled circuitry, which comprises a first positive-channel field effect tr...
Article
The IBM POWER8™ processor is a 649-mm , 4.2-billion transistor, high-frequency microprocessor fabricated in the IBM 22-nm silicon on insulator (SOI) technology with embedded dynamic random access memory (eDRAM) and 15 layers of metal. With its twelve architecturally enhanced, eight-way multithreaded cores, 96-MB high-bandwidth shared third-level ca...
Conference Paper
POWER8™ delivers a data-optimized design suited for analytics, cognitive workloads, and today's exploding data sizes. The design point results in a 2.5x performance gain over its predecessor, POWER7+™, for many workloads. In addition, POWER8 delivers the efficiency demanded by cloud computing models and also represents a first step toward creating...
Conference Paper
The 12-core 649mm2 POWER8™ leverages IBM's 22nm eDRAM SOI technology [1], and microarchitectural enhancements to deliver up to 2.5× the socket performance [2] of its 32nm predecessor, POWER7+™ [3]. POWER8 contains 4.2B transistors and 31.5μF of deep-trench decoupling capacitance. Three thin-oxide transistor Vts are used for power/performance tuning...
Conference Paper
This paper presents the successful use of the novel inline product-like logic vehicle (PATO) during the last technology development phases of IBM's 22nm SOI technology node. It provides information on the sequential PATO inline test flow, commonality analysis procedure, and commonality signature trending. The paper presents examples of systematic d...
Patent
Full-text available
General Scan Design (GSD) Local clock buffers which can also provide LSSD (Level Sensitive Scan Design) clocks
Article
Full-text available
Dual read port 6-transistor (6T) SRAMs play a critical role in high-performance cache designs thanks to doubling of access bandwidth, but stability and sensing challenges typically limit the low-voltage operation. We report a high-performance dual read port 8-way set associative 6T SRAM with a one clock cycle access latency, in a 32 nm metal-gate p...
Article
The IBM POWER7® processor contains many innovative circuit ideas that enable advanced architectural features. A high-density embedded dynamic random access memory is used to provide 32 MB of level-3 cache. Improved input/output (I/O) links provide up to 50 GB/s of I/O bandwidth. An innovative phase-locked loop design allows for dynamic, per-core fr...
Patent
Full-text available
Scan chain disable function to block transitions on the scan chain while in functional mode.
Patent
Full-text available
A method of hardening circuits, fabricated in SOI technology, against Single Error Upsets
Conference Paper
,, , In multi-ported register files, memory cell size grows quadratically with the total number of ports due to wordline and bitline wiring. Reducing the number of physical access ports in a memory cell can thus lead to significant area and power savings as well as latency improvement. Double-pumped register files operate access ports twice in a si...
Article
This paper gives an overview of the latest member of the POWER™ processor family, POWER7™. Eight quad-threaded cores, operating at frequencies up to 4.14 GHz, are integrated together with two memory controllers and high speed system links on a 567 mm die, employing 1.2B transistors in a 45 nm CMOS SOI technology with 11 layers of low-k copper wirin...
Conference Paper
Dual read port SRAMs play a critical role in high performance cache designs, but stability and sensing challenges typically limit the low voltage operation. We report a high-performance dual read port 8-way set associative 6T SRAM with a one clock cycle access latency, in a 32nm metal-gate partially depleted (PD) SOI technology, for low-voltage app...
Conference Paper
Introducing POWER7™ the latest member of the IBM POWER™ processor family. A 567 mm<sup>2</sup> chip implemented in 45nm SOI technology, holding eight quad threaded cores, a 32MB shared eDRAM L3, two memory controllers and high bandwidth SMP interfaces. The new out of order, shallow pipeline core with 12 execution units, multiport L1 caches and a pr...
Patent
Full-text available
Local Clock Generation for functional and test clocks, with control inputs for programming
Conference Paper
The POWER7<sup>TM</sup> microprocessor features a 32 kB L1 data cache with a 2R and banked-1W functionality using a 6T-SRAM cell. Read/write collision is intercepted inside the array with write-over-read priority. The array-specific power supply improves SRAM cell stability and performance while reducing the logic voltage level. The macro is fabric...
Conference Paper
The design of the clocked storage elements (CSEs) and associated local clocking circuitry is a critical consideration for modern microprocessor projects[1], and the POWER7™ chip[2], designed in a 45nm silicon-on-insulator (SOI) technology, was no exception. The digital logic contained over 2M CSEs, and the design of these elements had a major impac...
Conference Paper
POWER7<sup>TM</sup> the next generation processor of the POWER<sup>TM</sup> family is introduced. The 8-core chip, supporting 32 threads, is implemented in 45 nm 11 M CMOS SOI. The 32 kB L1 caches feature 1 read port banked write for the l-cache and 2 read ports banked write for the Dcache. The on-chip cache hierarchy consists of a 256 kB fast, pri...
Patent
Full-text available
A ring oscillator made from pulsed-clock LCBs, for characterization purposes.
Patent
Full-text available
A scannable latch that incorporates a dynamic logic front end, which includes a scan pull-down tree.
Article
Full-text available
The 65 nm cell broadband enginetrade (cell BE) is a multi-core SoC, implemented in a high performance SOI technology featuring a separate dual power supply for SRAM arrays to improve stability and performance using an elevated voltage. A new method is shown to analyze the SRAM cell under application conditions which was used to tune the cell for st...
Article
The Cell Broadband Engine™ (Cell/B.E.) processor was developed by Sony, Toshiba, and IBM engineers to deliver a high-speed, high-performance, multicore processor that brings supercomputer performance via a custom system-on-a-chip (SoC) implementation. To achieve its goals, the Cell/B.E. processor uses an innovative architecture, new circuit design...
Patent
Full-text available
A simplified method of timing a design containing transparent latches.
Article
The 65nm CELL Broadband Enginetrade design features a dual power supply, which enhances SRAM stability and performance using an elevated array-specific power supply, while reducing the logic power consumption. Hardware measurements demonstrate low-voltage operation and reduced scatter of the minimum operating voltage. The chip operates at 6GHz at 1...
Patent
Full-text available
Power saving method to insert logic in the scan chain to keep it from toggling during functional operation.
Article
The Cell Broadband Engine (Cell BE) is a multicore system-on-chip (SoC), implemented in a 90-nm high-performance silicon-on-insulator (SOI) technology, and optimized, within the triple constraints of area, power, and performance, to run at frequencies in excess of 3 GHz. The large scale of the design (~75 million logic transistors, and about 750 00...
Patent
Full-text available
An improved method of modeling latch transparency
Conference Paper
This paper reviews the design challenges that current and future processors must face, with stringent power limits and high frequency targets, and the design methods required to overcome the above challenges and address the continuing Giga-scale system integration trend. This paper then describes the details behind the design methodology that was u...
Conference Paper
This paper reviews the design challenges that current and future processors must face, with stringent power limits and high frequency targets, and the design methods required to overcome the above challenges and address the continuing Giga-scale system integration trend. This paper then describes the details behind the design methodology that was u...
Conference Paper
A 4-way SIMD streaming processor of a cell processor is developed in a 90nm SOI technology. CMOS static gates implement the majority of the logic. Dynamic circuits are used in critical areas, occupying 19% of the non-SRAM area. ISA, microarchitecture, and physical implementation are co-optimized to achieve a compact and power efficient design
Article
Power consumption is a major challenge in VLSI design. Power-constrained designs must attack power reduction with many techniques and require tools to accurately predict the power consumption. These tools give designers feedback on the efficiency of the power management logic. We present the basic methodology behind cycle-accurate power estimation....
Conference Paper
This paper reviews the design challenges that current and future processors must face with stringent power limits and high frequency targets, and the design methods required to address the continuing system integration trends. This paper then describes the implementation of a first-generation CELL processor and the design methods used to overcome t...
Conference Paper
We present the double precision multiplier for a 90nm PD-SOI first-generation CELL processor. Dynamic booth logic is designed for scalability and with noise, leakage, and pulse width variation tolerance. Static partial product compression is implemented with replicated bits for performance. The design supports fine-grained clock gating domains for...
Conference Paper
The implementation of a first-generation CELL processor that supports multiple operating systems including Linux consists of a 64 bit power processor element (PPE) and its L2 cache, multiple synergistic processor elements (SPE) (B. Flachs et al.) each with its own local memory (LS) (T. Asano et al.), a high bandwidth internal element interconnect b...
Conference Paper
The feasibility of a fully integrated RF front-end using an above-IC BAW integration technique is demonstrated for WCDMA applications. The circuit has a voltage gain of 31.3dB, a noise figure of 5.3dB, an in-band IIP3 of -8dBm and IIP2 of 38dBm, with a total power consumption of 36mW. The BAW filter area is 0.45mm<sup>2</sup> and the total circuit...
Conference Paper
A CELL processor is a multi-core chip consisting of a 64b power architecture processor, multiple streaming processors, a flexible IO interface, and a memory interface controller. This SoC is implemented in 90nm SOI technology. The chip is designed with a high degree of modularity and reuse to maximize the custom circuit content and achieve a high-f...
Patent
Full-text available
Methods of generating local clocks, allowing for test sequences and for local clock gating.
Patent
Full-text available
A method for operating master-slave latches providing two modes of operation, pulsed mode (master latch is held open while clock to slave is pulsed), and normal master-slave operation with half-cycle clocks.
Patent
Full-text available
Circuit structure with static logic interface, dynamic logic gate, and a static latch, which can be used with surrounding static logic.
Article
To address the challenges in microprocessor designs beyond a gigahertz, an instruction window buffer (IWB) was designed. The IWB implements the processor parts for renaming, reservation station, and reorder buffer as a unified buffer. Measured results on an experimental chip demonstrated operation of the IWB macros supporting 1.8 GHz, with the chip...
Conference Paper
An Instruction Window Buffer (IWB) addresses the challenges in microprocessor designs beyond a GHz. The IWB implements the processor parts for renaming, reservation station and reorder buffer as a unified buffer. Measured results on an experimental chip demonstrate operation of the IWB macros at 1.8 GHz, with the chip at the fast end of the process...
Article
This paper presents a survey of some of the most aggressive custom designs for CMOS processor products and prototypes in IBM. We argue that microprocessor performance growth, which has traditionally been driven primarily by CMOS technology and microarchitectural improvements, can receive a substantial contribution from improvements in circuit desig...
Article
A L2 cache chip design for high performance /390-mu-processors in a multiprocessor environment is presented. Design objectives, logic and technological implementation as well as measured results are discussed. The design features a 96 k Bytes L2 cache-, its associated directory- and LRU arrays, dataflow and control logic integrated on one CMOS stan...
Article
Advanced charge-buffered-logic (CBL) circuits featuring double-poly self-alignment, a 'free' epi-base lateral p-n-p (cutoff frequency=300 MHz only), and deep trench isolation are discussed. Using 1.2- mu m design rules and a modified push-pull output stage, a gate delay (fan-in=3) of 278 ps was obtained at a DC current of 30 mu A/gate. The low powe...

Network

Cited By