Book

Advanced Computer Architectures

Authors:
Chapter
It is extremely important to point out that through quantum computing, it is possible to obtain results previously unachievable until then by traditional computing. Based on this, this paper aims to give an overview and glimpse of the potential horizon of this area of study that has become increasingly robust in computing over the recent past. It is beyond our scope to address deep concepts of quantum mechanics on which quantum computing is based and reveals. It is rather to form a simple, comprehensive and consistent intellectual scope, not only for specialists but targeted common interest. Thus, this research was built on the latest studies on the subject recognized within the scientific and academic fields.
Chapter
Full-text available
The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body asself or nonself substances. It does this with the help of a distributed task force that has theintelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years.
Article
Full-text available
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.
Article
Full-text available
In real systems, fault diagnosis is performed by a human diagnostician, and it encounters complex knowledge associations, both for normal and faulty behaviour of the target system. The human diagnostician relies on deep knowledge about the structure and the behaviour of the system, along with shallow knowledge on fault-to-manifestation patterns acquired from practice. This paper proposes a general approach to embed deep and shallow knowledge in neural network models for fault diagnosis by abduction, using neural sites for logical aggregation of manifestations and faults. All types of abduction problems were considered. The abduction proceeds by plausibility and relevance criteria multiply applied. The neural network implements plausibility by feed-forward links between manifestations and faults, and relevance by competition links between faults. Abduction by plausibility and relevance is also used for decision on the next best test along the diagnostic refinement. A case study on an installation in a rolling mill plant is presented.
Article
Full-text available
The fabrication and characteristics of organic smart pixels are described. The smart pixel reported in this letter consists of a single organic thin-film field effect transistor (FET) monolithically integrated with an organic light-emitting diode. The FET active material is a regioregular polythiophene. The maximum optical power emitted by the smart pixel is about 300 nW/cm2 corresponding to a luminance of ∼ 2300 cd/m2. © 1998 American Institute of Physics.
Article
Full-text available
In this paper, we review an emerging engineering discipline to program cell beha-viors by embedding synthetic gene networks that perform computation, communications, and signal processing. To accomplish this goal, we begin with a genetic component library and a biocircuit design methodology for assembling these components into compound circuits. The main challenge in biocircuit design lies in selecting well-matched genetic components that when coupled, reliably produce the desired behavior. We use simulation tools to guide circuit design, a process that consists of selecting the appropriate components and genet-ically modifying existing components until the desired behavior is achieved. In addition to such rational design, we also employ directed evolution to optimize genetic circuit behavior. Building on Nature's fundamental principle of evolution, this unique process directs cells to mutate their own DNA until they find gene network configurations that exhibit the desired system characteristics. The integration of all the above capabilities in future synthetic gene networks will enable cells to perform sophisticated digital and analog computation, both as individual entities and as part of larger cell communities. This engineering discipline and its associated tools will advance the capabilities of genetic engineering, and allow us to harness cells for a myriad of applications not previously achievable.
Conference Paper
Full-text available
Active memory systems help processors overcome the memory wall when applications exhibit poor cache behavior. They consist of either active memory elements that perform data parallel computations in the memory system itself, or an active memory controller that supports address re-mapping techniques that improve data locality. Both active memory approaches create coherence problems---even on uniprocessor systems---since there are either additional processors operating on the data directly, or the processor is allowed to refer to the same data via more than one address. While most active memory implementations require cache flushes, we propose a new technique to solve the coherence problem by extending the coherence protocol. Our active memory controller leverages and extends the coherence mechanism, so that re-mapping techniques work transparently on both uniprocessor and multiprocessor systems.We present a microarchitecture for an active memory controller with a programmable core and specialized hardware that accelerates cache line assembly and disassembly. We present detailed simulation results that show uniprocessor speedup from 1.3 to 7.6 on a range of applications and microbenchmarks. In addition to uniprocessor speedup, we show single-node multiprocessor speedup for parallel active memory applications and discuss how the same controller architecture supports coherent multi-node systems called active memory clusters.
Conference Paper
Full-text available
This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus. The solution aims at reducing bus traffic and hence bus wait time. This in turn increases the overall processor utilization. Unlike most traditional high-performance coherence solutions, this solution does not use any global tables. Furthermore, this coherence scheme is modular and easily extensible, requiring no modification of cache modules to add more processors to a system. The performance of this scheme is evaluated by using an approximate analysis method. It is shown that the performance of this scheme is closely tied with the miss ratio and the amount of sharing between processors.
Article
"Grid" computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation. In this article, we define this new field. First, we review the "Grid problem," which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources-what we refer to as virtual organizations. In such settings, we encounter unique authentication, authorization, resource access, resource discovery, and other challenges. It is this class of problem that is addressed by Grid technologies. Next, we present an extensible and open Grid architecture, in which protocols, services, application programming interfaces, and software development kits are categorized according to their roles in enabling resource sharing. We describe requirements that we believe any such mechanisms must satisfy, and we discuss the central role played by the intergrid protocols that enable interoperability among different Grid systems. Finally, we discuss how Grid technologies relate to other contemporary technologies, including enterprise integration, application service provider, storage service provider, and peer-to-peer computing. We maintain that Grid concepts and technologies complement and have much to contribute to these other approaches.
Article
Most computers have a single processing unit. In this new parallel computer 65,536 processors work on a problem at once. The resulting speed may transform several fields, including artificial intelligence. Salient features of this parallel computer are discussed.
Article
Scitation is the online home of leading journals and conference proceedings from AIP Publishing and AIP Member Societies
Article
The paper discusses issues pertinent to performance analysis of massively parallel systems. A model of parallel execution based on threads of control and events is then introduced. The key ingredient of this model is a measure of the communication complexity, which gives the number of events E as a function of the number of threads of control P and provides a signature of a parallel computation. Various consequences for speedup and load balancing are presented.
Article
In this paper three models of parallel speedup are studied. They are fixed-size speedup, fixed-time speedup, and memory-bounded speedup. The latter two consider the relationship between speedup and problem scalability. Two sets of speedup formulations are derived for these three models. One set considers uneven workload allocation and communication overhead and gives more accurate estimation. Another set considers a simplified case and provides a clear picture on the impact of the sequential portion of an application on the possible performance gain from parallel processing. The simplified fixed-size speedup is Amdahl′s law. The simplified fixed-time speedup is Gustafson′s scaled speedup. The simplified memory-bounded speedup contains both Amdahl′s law and Gustafson′s scaled speedup as special cases. This study leads to a better understanding of parallel processing.
Article
The performance limits of optical, electro-optical, and electronic artificial neural systems (ANS) processors (also known as neurocomputers) are discussed. After a brief introduction, an overview is provided of the recently revived field of ANS. Next, ANS performance measures are defined and a neurocomputer taxonomy is presented. Finally, the designs and performance limits of the various types of neurocomputers are discussed.
Article
Optical logic gates for OR-AND-NOT, NOR and exclusive-XNOR operations in waveguides consisting of nonlinear material are numerically investigated by means of FD-BPM (Finite Difference Beam Propagation Method). The proposed devices are designed utilizing the self-routing characteristics of nonlinear X-crossing and Y-branching structures when they are operated with one input beam or two. The numerical simulations show that the proposed structures can favorably be applied to optical data processing and computing as fundamental logic gates.
Article
The AP3000 is a distributed-memory parallel server consisting of multiple workstations connected via a high-speed communication network. Each workstation (node) uses the advanced UltraSPARC CPU and the Solaris operating system. By combining the AP3000 with a remote memory copy function and by performing inter-node communication using Fujitsu's newly developed AP-Net high-speed message communication network, the AP3000 can be used as a high-performance parallel computer and a workstation cluster. The system has hardware to support single-system-image operation of a multi-node system.
Article
The CRAY T3D system is the first massively parallel processor from Cray Research. The implementation entailed the design of system software, hardware, languages, and tools. A study of representative applications influenced these designs. The paper focuses on the programming model, the physically distributed, logically shared memory interconnect, and the integration of Digital's DECchip 21064 Alpha AXP microprocessor in this interconnect. Additional topics include latency-hiding and synchronization hardware, libraries, operating system, and tools.
Book
This book describes parallel programming and all the basic concepts illustrated by examples in a simplified FORTRAN. Concepts covered include: The parallel programming model; The creation of multiple processes; Memory sharing; Scheduling; Data dependencies. In addition, a number of parallelized applications are presented, including a discrete-time, discrete-event simulator, numerical integration, Gaussian elimination, and parallelized versions of the traveling salesman problem and the exploration of a maze.
Article
An abstract is not available.
Article
If the unique information-processing capabilities of protein enzymes could be adapted for computers, then evolvable, more efficient systems for such applications as pattern recognition and process control are in principle possible.
Article
The Manchester project has developed a powerful dataflow processor based on dynamic tagging. This processor is large enough to tackle realistic applications and exhibits impressive speedup for programs with sufficient parallelism.
Conference Paper
A neural network model was formulated to describe the spatiotemporal properties of the photoreceptor and horizontal cell responses to light in the vertebrate retina. The model consists of two layers, one of which represents the photoreceptor syncytium and the other the horizontal cell syncytium. Each syncytium is an array of elemental neurons, each of which is expressed by the coupling resistance between neighboring cells and membrane impedance. Corresponding elements of each syncytium interact reciprocally with synaptic weighting functions. The analytical solutions of the voltage responses to extrinsically applied current, which corresponds to the light-induced current, were obtained in the frequency domain. The model provides a computational framework of the early visual processing in the vertebrate retina.< >
Conference Paper
An electronic neurocomputer, Neuro Turbo, has been implemented, using the recently developed general-purpose, 24-bit floating-point digital signal processor (DSP), MB86220. The Neuro Turbo is a MIMD (multiple-instruction-multiple-data) type parallel processor with four ring-coupled DSPs and four dual-port memories. The performance of the Neuro Turbo has been evaluated for several kinds of three-layer neural network. An operational speed of 2 MCPS for the learning process and 11 MCPS for the rewarding process has been achieved
Article
This paper introduces a general model of parallel performance. With the goal of developing conceptual and empirical methods for characterizing and understanding parallel algorithms, new definitions of speedup and efficiency have been formulated. These definitions take into account the effects that problem size and the number of processors have on efficiency and speedup and provide a natural and quantifiable measure of parallel performance. The terms introduced in the definitions provide new and improved interpretations of the “serial” and “parallel” fraction parameters commonly used in the literature (i.e., Amdahl's model) to explain the behavior of parallel algorithms. The model provides a more complete characterization of parallel algorithm behavior and is used to correct apparent deficiencies in the formulation of speedup as expressed by Amdahl's model.
Conference Paper
This paper briefly examines certain of the Intelligent Information Retrieval (IIR) mechanisms used in the RESEDA system, a system equipped with “reasoning” capabilities in the field of complex biographical data management. Particular attention is paid to a description of the different “levels” of inference procedure which can be executed by the system. The intention is to show that the technical solutions to IIR problems implemented in RESEDA are of an equivalent level to those now proposed in the same field by the Japanese project for Fifth Generation Computer Systems.
Conference Paper
Cache is a fast buffer memory between the processor and the main memory and has been extensively used in the larger computer systems. The principle of operation and the various designs of the cache in the uniprocessor system are well documented. The memory system of multiprocessors has also received much attention recently; however, they are limited to the systems without a cache. Little if any information exists in the literature addressing the principle and design considerations of the cache system in the tightly coupled multiprocessor environment. This paper describes such a cache design. System requirements in the multiprocessor environment as well as the cost-performance trade-offs of the cache system design are given in detail. The possibility of sharing the cache system hardware with other multiprocessing facilities (such as dynamic address translation, storage protection, locks, serialization, and the system clocks) is also discussed.
Conference Paper
Past work on studying cache coherence in shared-memory symmetric multiprocessors (SMPs) concentrates on studying aggregate events, often from an architecture point of view. However, this approach provides insufficient information about the exact sources of inefficiencies in parallel applications. For SMPs in contemporary clusters, application performance is impacted by the pattern of shared memory usage, and it becomes essential to understand coherence behavior in terms of the application program constructs -- such as data structures and source code lines.The technical contributions of this work are as follows. We introduce ccSIM, a cache-coherent memory simulator fed by data traces obtained through on-the-fly dynamic binary rewriting of OpenMP benchmarks executing on a Power3 SMP node. We explore the degrees of freedom in interleaving data traces from the different processors and assess the simulation accuracy by comparing with hardware performance counters. The novelty of ccSIM lies in its ability to relate coherence traffic -- specifically coherence misses as well as their progenitor invalidations -- to data structures and to their reference locations in the source program, thereby facilitating the detection of inefficiencies. Our experiments demonstrate that (a) cache coherence traffic is simulated accurately for SPMD programming styles as its invalidation traffic closely matches the corresponding hardware performance counters, (b) we derive detailed coherence information indicating the location of invalidations in the application code, i.e, source line and data structures and (c) we illustrate opportunities for optimizations from these details. By exploiting these unique features of ccSIM, we were able to identify and locate opportunities for program transformations, including interactions with OpenMP constructs, resulting in both significantly decreased coherence misses and savings of up to 73% in wall-clock execution time for several real-world benchmarks.
Conference Paper
The importance of reducing processor-memory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are inappropriate in these environments because they generate large bursts of bus traffic. A cache exploiting primarily temporal locality (look-behind) is then proposed and demonstrated to be effective in an environment where process switches are infrequent. We argue that such an environment is possible if the traffic to backing store is small enough that many processors can share a common memory and if the cache data consistency problem is solved. We demonstrate that such a cache can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.
Book
While there are several studies of computer systems modeling and performance evaluation where models of multiprocessor systems can be found as examples of applications of general modeling techniques, this is the first to focus entirely on the problem of modeling and performance evaluation of multiprocessor systems using analytical methods. Increasingly sophisticated and fast-moving technologies require models that can estimate the performance of a computer system without having actually to build and test it, models that can help designers make the correct architectural choices. The area of distributed computer architectures, or multiprocessor systems, has numerous such choices and can greatly benefit from an extensive use of performance evaluation techniques in the system design stage. The multiprocessor features that are studied here focus on contention for physical system resources, such as shared devices and inter-connection networks. A brief overview covers the modeling of other important system characteristics, such as failures of components and synchronizations at the software level.
Article
Novel data-driven and demand-driven computer architectures are under development in a large number of laboratories in the United States, Japan, and Europe. These computers are not based on the tradlUonal von Neumann organization; instead, they are attempts to identify the next generation of computer. Basmally, m data-driven (e.g., data-flow) computers the availability of operands triggers the execution of the operation to be performed on them, whereas in demand-driven (e.g, reduction) computers the reqmrement for a result triggers the operation that will generate it. Although there are these two distinct areas of research, each laboratory has developed its own mdlvxdual model of computation, stored program representation, and machine organization. Across this spectrum of designs there m, however, a significant sharing of concepts. The aim of this palaer m to identify the concepts and relationships that exist both within and between the two areas of research. It does thin by examlmng data-driven and demand-driven architecture at three levels, computation organizatmn, (stored) program organization, and machine organLzation. Finally, a survey of various novel computer architectures under development is given.
Article
A performance evaluation of the Symmetry multiprocessor system revealed that the synchronization mechanism did not perform well for highly contested locks, like those found in certain parallel applications. Several software synchronization mechanisms were developed and evaluated, using a hardware monitor, on the Symmetry multiprocessor system; the mechanisms were to reduce contention for the lock. The mechanisms remain valuable even when changes are made to the hardware synchronization mechanism to improve support for highly contested locks. The Symmetry architecture is described, and a number of lock algorithms and their use of hardware resources are examined. The performance of each lock is observed from the perspective of both the program itself and the total system performance.