IEEE Micro

Published by Institute of Electrical and Electronics Engineers
Online ISSN: 0272-1732
Publications
Article
One of the primary concerns for microprocessor designers has always been balancing power and thermal management while minimizing performance loss. rather than generate solutions to this dilemma, the advent of multicore chips has raised a host of new challenges. this discussion with Pradip Bose and Kanad Ghose, excerpted from a 2007 Card Workshop Panel, explores the future of low-power design and temperature management.
 
Article
Distributed embedded computer systems are the key enablers of X-by-wire systems and control system functions. While developers can validate the correct operation of the communication and operating systems and the silicon implementations - the basis of embedded computer systems - once and for all, they cannot validate the application-dependent software and data structures in these systems in the same manner. The developer must configure the communication system for the respective application, create middleware code to access the communication system, and, last but not least, implement the application software. Because this is necessary for every new application, we need a well-defined process and a complementary set of tools to minimize error and support a high-quality development life cycle. We propose a model-based process - the "A" process. It consists of a sequence of models, each of which serves a specific purpose and hence contains only those pieces of information it requires for this purpose. The models are linked to each other by process transitions that either add information to or extract information from their predecessors. The A process guides the developer from one model to the next and is supported by a set Of tools. In this article, we discuss development process models in general, and our model-based process in particular
 
Article
As part of MEDEA's T505 project, an attempt has been made to study ways to develop and characterize the employment of 0.18-micron CMOS technology. The most important part of the study concerned the front-end architecture. Different process experiments on the transistor architecture have been explored as well as some new, specific process modules needed to reach this performance targeted for this CMOS process. The data obtained indicate that when going from one process generation to the next one, it is important to identify the parameters that will need specific new developments to reach the goals assigned to the overall process.
 
Article
This low-cost auxiliary processor is based on commercially available microprocessors. Its parallel capabilities enhance the performance of small computer systems in vector and associative operations.
 
Article
Massively parallel image processing with as many processing elements as pixels can achieve real-time vision with rates as high as 1,000 frames per second. The authors implemented a QVGA-size pixel-parallel image processor for object identification and pose estimation on a single chip. Bit-serial operation and dynamic logic reduce the circuit area, and pipelining enables high processing speed.
 
Article
P1000 specifies a simple, 8-bit, Eurocard-based backplane. Designed to be processor-independent, it can be used in stand-alone configurations or as a high-performance I/O channel in distributed systems.
 
Article
Entry time into the extended writable control store on the VAX-11/780 can be reduced to 200 nanoseconds by means of this simple hardware modification.
 
Article
The synergistic processor element is a new architecture oriented for multimedia and streaming processing. In this architecture, the memory is not a cache but a private or scratch pad memory. Such a memory is simple and needs to be high-frequency and large space in low-power. This design uses an 11 fan-out of four (11FO4), six-cycle, fully pipelined, embedded 256-Kbyte SRAM for this purpose. The design's memory is not one hard macro, but a group of custom macros physically distributed to optimize the pipeline.
 
Article
Discusses the history and development of IEEE Std 1284-1994, standard signalling method for a bidirectional parallel peripheral interface for personal computers. This new PC parallel interface standard solves printer problems and illustrates effective standards building.< >
 
Article
Not Available
 
Article
The statistics collector and analyzer records, displays, and analyzes performance measurements from an active IEEE-1394 bus in real time. An empirical analysis using SCA exposes the unique, complex arbitration mechanisms used by IEEE-1394 nodes and their effect on the performance of higher level protocols
 
Article
In December 1995, the final public review of the High Performance Serial Bus draft standard concluded, and the IEEE ratified it as IEEE Std 1394-1995. As 1394 working-group chair, I extend my appreciation to all those who took the time to review vote, and comment upon the draft standard. It is also gratifying to the entire working group to observe the various computer and digital audiovisual (AV) products now incorporating support for the 1394 interface. This column's purpose is to review technical issues raised by the IEEE study group exploring the development of a second-generation standard based on 1394. This second-generation activity is likely to consist of several 1394-related working groups. Each expanding a different attribute or capability of the standard. To set a proper foundation for the discussion of 1394's future. I first summarize the concepts and facilities of the present standard and review its current status in the computer and digital AV industries
 
Article
The design of a high-capacity Θ-search associative memory (Θ∈{<,>,&les;,&ges;,=≠}) is presented. PSPICE simulation and layouts show that the proposed Θ-search associative memory chip consisting of 256 words, each 64-b long, can fit on a 13.5-mm×9.5-mm chip. It can perform maskable Θ-search operations over its contents in 110 ns
 
Article
An FFT algorithm operating on a 16-bit microcomputer can calculate a 256-point transform as much as 10 times faster than a similar algorithm on an eight-bit microcomputer.
 
Article
The recent advent of 16-bit microcomputers with built-in multiplication hardware has created a new option for implementing digital filters. Procedures for the implementation of digital filters by cascaded and parallel second-order modules are described. A typical 16-bit microcomputer, the Intel 8086, is used; the results, however, are applicable to other microcomputers. The mathematical description, of the cascaded and parallel second-order modules, six structures for second-order modules, program flowcharts, and implementation results for both the parallel and cascade case are presented. The notation and symbols used are also presented. The subroutines for implementation of the second order digital filter modules are furnished.
 
Article
The performance of two addressing mechanisms on three different microprocessors is examined. One of the mechanisms-and one of the micros-provided superior performance.
 
Article
Today's microprocessors exhibit powerful computing capabilities. Their characteristic differences favor each machine for a distinct portion of the applications spectrum.
 
Article
A cyclic redundancy code, or CRC, is often used to ensure the integrity of messages in data communications. This program for generating the CRC is faster than its bit-wise counterpart.
 
Article
The architecture and characteristics of a fully functional 40 MHz device that performs the 8*8 inverse discrete cosine transform (IDCT) for digital HDTV decoders are presented. The IDCT chip converts four 14-b DCT coefficients into four 11-b pixel values each cycle. Fixed-coefficient multiplier Wallace trees in which partial products are rounded before summation help compute the inner products. The 31000-gate device was implemented in a 10.5 mm die using 1 mu m CMOS array-based process.< >
 
Article
Guest Editors John Sell and Alan Jay Smith talk about what went into the presentation of IEEE Micro's Hot Chips 17 issue.
 
Article
Over its 18-year history, the Hot Chips conference has become a leading forum for the latest computing, communications, and networking chips. The conference covers a variety of technical details about these chips, including technology, fundamental algorithms, packaging techniques, architecture, and circuit details. This issue features six of the best presentations from Hot Chips 18, expanded to full articles. Topics include multiprocessor/multicore systems, embedded processing, and low-power processing.
 
Article
This special issue showcases articles written from five of the best presentations at the Hot Chips 19 conference, held in August 2007. The guest editors give highlights of the conference and introduce the articles, which cover the mobile-optimized northbridge of AMD's Griffin microprocessor family; the IBM z10 next-generation mainframe microprocessor; fault tolerance in IBM's Power6 microprocessor; NVIDIA's Tesla unified graphics and computing architecture; and SiBEAM's 4-Gbps 1080p-capable uncompressed HD A/V wireless 60-GHz transceiver chipset.
 
Article
GaAs technology has matured sufficiently to allow fabrication of an entire RISC on one chip. GaAs also supports 200-MHz clock rates and 100-MIPS instruction rates.
 
Article
The Gmicro/200, a microprocessor that has been developed as part of Japan's TRON (The Real-Time Operating Nucleus) project, is described. This microprogram-based processor with six-state pipeline, 730000 transistors and on-chip caches will serve in an engineering workstation or a high-speed graphics accelerator system. The authors discuss features of the instruction set; memory management; handling of exceptions, interrupts and traps; and the implementation of the Gmicro/200.< >
 
Article
First Page of the Article
 
Article
The pace of advances in semiconductor technology over the last four decades is examined, and projections are made for the decade ahead. Predictions are made in the areas of memory, logic chips, wafer-scale integration, digital signal processing, communication chips, and neural networks, and problems are identified. Also discussed are GaAs, superconducting devices, optical computing, molecular computing, mass storage technologies, and handwriting and speech recognition. It is noted that maintaining the pace of technology development during the next ten years will take huge investments and that such funds must ultimately come from the marketplace. It is concluded that the major challenge is to expand the microcomputer market through new applications and more intensive consumer use of existing products.< >
 
Article
Sure System 2000, a fault-tolerant computer that couples multiprocessors to offer low-priced, high-performance systems that deal effectively with faults and failures, is presented. The architecture is based on the local redundancy technique, ensuring that no hardware or software fault can cause a system crash. Software errors can be fixed, and hardware can be replaced, upgraded, or added dynamically. Existing fault-tolerant computers are briefly reviewed, and the logic hardware system configuration of the Sure System 2000 is described. The multiprocessor and I/O architecture are examined. The SXO Sure System 2000 expandable operating system is described.< >
 
Article
We propose Intelligent Watcher (iWatcher), a combination of hardware and software support that can detect large variations of software bugs with only modest hardware changes to current processor implementations. iWatcher lets programmers associate specified functions to "watched" memory locations or objects. Access to any such location automatically triggers the monitoring function in the hardware. Relative to other approaches, iWatcher detects many real bugs at a fraction of the execution-time overhead
 
Article
With the natural trend toward integration, microprocessors are increasingly supporting multiple cores on a single chip. To keep design effort and costs down, designers of these multicore microprocessors frequently target an entire product range, from mobile laptops to high-end servers. This article discusses a continual flow pipeline (CFP) processor. Such processor architecture can sustain a large number of in-flight instructions (commonly referred to as the instruction window and comprising all instructions renamed but not retired) without requiring the cycle-critical structures to scale up. By keeping these structures small and making the processor core tolerant of memory latencies, a CFP mechanism enables the new core to achieve high single-thread performance, and many of these new cores can be placed on a chip for high throughput. The resulting large instruction window reveals substantial instruction-level parallelism and achieves memory latency tolerance, while the small size of cycle-critical resources permits a high clock frequency
 
Article
This special issue represents the fifth anniversary of IEEE Micro's Top Picks from the Computer Architecture Conferences. A program committee of 33 highly respected architects from industry and academia selected 10 of 70 submissions for inclusion in the issue. The guest editors describe the rigorous selection process and introduce the selected articles.
 
Article
The columnist reviews Devices of the Soul by Steve Talbott (O'Reilly, 2007). Since 1995, Talbott has edited an online newsletter, NetFuture: Technology and Human Responsibility (http://www.netfuture.org). This book, which includes material first published in the newsletter, interweaves many themes that reinforce one basic message: Talbott believes that we are not analyzing and mitigating against the risks of technology. We are letting the repeated assurances of progress through technology lull us into self-forgetfulness.
 
Article
The author of this book on global energy systems is a foreign affairs columnist with the New York Times. This book deals with global systems and focuses on three themes the author says are pushing the world in what he calls the energy-climate era: global climate change; the emergence of new, globally connected middle classes in China, India, Russia, and elsewhere; and worldwide population growth. The book is recommended if you want to understand the workings of the worldwide energy system and how it interacts with population growth and change.
 
Article
The 12 articles in this special issue highlight the following key trends in current computer architecture design and research: the increased importance of thread-level parallelism; the increasing complexity of software and the need for architectures that improve programmability, analyzability, and correctness; the increasing urgency of controlling chip-wide power; and the emerging problem of process variation.
 
Article
Richard Mateosian reviews this book that offers nuggets of information to anyone who designs software. Nearly 50 people contribute their ideas. Each entry fills two facing pages and includes a contributor photo and brief biography. Most of the ideas fall into the following themes: communication; requirements; pitfalls; design and development approaches; performance and maintenance; tools and techniques; and the necessary quality or skills of software architects.
 
Article
To be a successful technical writer today, you must understand the technical and business aspects of the work you do and how it affects the organizations that pay for that work. Richard Hamilton's book, Managing Writers: A Real World Guide to Managing Technical Documentation, covers all of the bases you need to touch. Hamilton provides information that both managers and writers need to know about managing people, projects, and technology.
 
Article
An overview is given of Pygmalion, which aims to promote European industry's application of neural networks and develop `standard' computational tools for their programming and simulation. A complete environment for developing algorithms and applications will demonstrate the network capabilities expected from their properties of massive parallelism, fault tolerance, adaptivity, and learning. Key real-world applications in image processing and speech processing and a small application in acoustic signals were selected to demonstrate the potential of neural networks for various industrial problems. In image processing, remote data sensing and factory inspection were investigated. In speech processing, the foundations were laid for an automatic speech recognition system by developing efficient learning algorithms for the basic building blocks
 
Article
The 2100 accesses external memory efficiently and devotes its silicon area to providing greater functionality and processing throughput.
 
Article
The Alpha AXP 64-b architecture, which forms the basis for a series of high-performance computer systems, is described. The implementation of this architecture in the 21064 microprocessor is discussed. This 1.4-cm×1.7-cm CMOS chip incorporates 1.68 million transistors using a 0.75-μm, three-metal process. Performance measurement results for a variety of commonly used benchmarks under both OpenVMS AXP V1 and DEC OSF/1 V1.2 are presented
 
Article
The 21164 is a new quad-issue, superscalar Alpha microprocessor that executes 1.2 billion instructions per second. The 300-MHz, 0.5-μm CMOS chip delivers an estimated 345/505 SPECint32/SPECfp92 performance. The design's high clock rate, low operational latency, and high-throughput/nonblocking memory systems contribute to this performance
 
Article
Alpha microprocessors have been performance leaders since their introduction in 1992. The first generation 21064 and the later 21164 raised expectations for the newest generation-performance leadership was again a goal of the 21264 design team. Benchmark scores of 30+ SPECint95 and 58+ SPECfp95 offer convincing evidence thus far that the 21264 achieves this goal and will continue to set a high performance standard. A unique combination of high clock speeds and advanced microarchitectural techniques, including many forms of out-of-order and speculative execution, provide exceptional core computational performance in the 21264. The processor also features a high-bandwidth memory system that can quickly deliver data values to the execution core, providing robust performance for a wide range of applications, including those without cache locality. The advanced performance levels are attained while maintaining an installed application base. All Alpha generations are upward-compatible. Database, real-time visual computing, data mining, medical imaging, scientific/technical, and many other applications can utilize the outstanding performance available with the 21264
 
Article
As interconnection networks proliferate to many new applications, a low-latency high-throughput fabric is no longer sufficient. Applications are becoming power-constrained. We propose an architectural-level power model for interconnection network routers that will allow researchers and designers to easily factor in power when exploring architectural trade-offs. We applied our model to two commercial routers - the integrated Alpha 21364 router and the IBM 8-port 12X InfiniBand router and show that the different micro-architectures lead to vastly different power consumption and distribution estimates.
 
Article
What's the difference between ignorance and indifference? If you are one the rare few who don't know and don't care, then you have the kind of open-mindedness that may allow you to rethink entrepreneurial inventing for the 21st century.
 
Article
Posix standards represent approximately $80 billion of the $250+ billion-Unix market over the last decade; they also provide the basis for significant growth in the future. This picture of the substantial success of Posix standards (and the business impact of standards in general) differs from that presented in Micro's May/June 1998 issue. The overall Unix market boasts billions of dollars in revenues, and is projected to continue growing into the future. Major areas for this growth feature high-end systems (data warehousing, servers, and supercomputing) rather than desktop applications
 
Quaternary n-trees of dimensions 1 (a), 2 (b), and 3 (c).
Virtual address translation. The dotted lines in the figure signify that a segment of memory from one address space maps onto an equally sized segment of memory in another address space.
Article
The Quadrics network extends the native operating system in processing nodes with a network operating system and specialized hardware support in the network interface. Doing so integrates an individual node's address spaces into a single, global, virtual address space and provides network fault tolerance
 
Article
An experimental 4-bit Josephson processor that demonstrates the possibility of a Josephson computer system with a gigahertz clock is presented. Constructed from 2066 gates on a 5-mm×5-mm die, it implements an eight-instruction set to enable the basic operations of digital signal processing. The basic structure and operating principles of the Josephson device are reviewed, and the design of and operating results for the processor are presented. The focus is mainly on the circuit techniques required to realize a gigahertz clock in a Josephson processor. Circuit techniques to suppress the crosstalk from the AC power to the I/O lines and ways to improve the clock frequency are introduced. The problems remaining in the effort to enhance performance are discussed
 
Article
High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards. In contrast, Neon, a single chip, performs like a multichip design, accelerating OpenGL 3D rendering and X11 and Windows/NT 2D rendering. Neon extracts competitive performance from a limited memory bandwidth by using a greater percentage of peak memory bandwidth than competing chip sets and by reducing bandwidth requirements wherever possible. Neon fits on a one die, because real states are extensively shared among similar functions, which made side effects of performance tuning efforts more effective.
 
Top-cited authors
S. Borkar
  • Qualcomm
William J. Dally
Hadi Esmaeilzadeh
  • Georgia Institute of Technology
Anant Agarwal
  • Massachusetts Institute of Technology
David E. Culler
  • University of California, Berkeley