Chapter

Trends in Computing and Memory Technologies

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The current decade is poised to see a clear transition of technologies from the de-facto standards. After supporting tremendous growth in speed, density and energy efficiency, newer CMOS technology nodes provide diminishing returns, thereby paving way for newer, non-CMOS technologies. Already multiple such technologies are available commercially to satisfy the requirement of specific market segments. Additionally, researchers have demonstrated multiple system prototypes built out of these technologies, which do co-exist with CMOS technologies. Apart from clearly pushing the limits of performance and energy efficiency, the new technologies present opportunities to extend the architectural limits, e.g., in-memory computing; and computing limits, e.g., quantum computing. The eventual adoption of these technologies are dependent on various challenges in device, circuit, architecture, system levels as well as robust design automation flows. In this chapter, a perspective of these emerging trends is painted in manufacturing technologies, memory technologies and computing technologies. The chapter is concluded with a study on the limits of these technologies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Moore’s Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore’s Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
Article
Full-text available
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of deep neural network to improve energy-efficiency and throughput without sacrificing performance accuracy or increasing hardware cost are critical to enabling the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various platforms and architectures that support DNNs, and highlight key trends in recent efficient processing techniques that reduce the computation cost of DNNs either solely via hardware design changes or via joint hardware design and network algorithm changes. It will also summarize various development resources that can enable researchers and practitioners to quickly get started on DNN design, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-design, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand trade-offs between various architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand of recent implementation trends and opportunities.
Article
Full-text available
Approximate computing trades off computation quality with the effort expended and as rising performance demands confront with plateauing resource budgets, approximate computing has become, not merely attractive, but even imperative. In this paper, we present a survey of techniques for approximate computing (AC). We discuss strategies for finding approximable program portions and monitoring output quality, techniques for using AC in different processing units (e.g., CPU, GPU and FPGA), processor components, memory technologies etc., and programming frameworks for AC. We classify these techniques based on several key characteristics to emphasize their similarities and differences. The aim of this paper is to provide insights to researchers into working of AC techniques and inspire more efforts in this area to make AC the mainstream computing approach in future systems.
Article
Full-text available
Daniel A. Reed and Jack Dongarra state that scientific discovery and engineering innovation requires unifying traditionally separated high-performance computing and big data analytics. Big data machine learning and predictive data analytics have been considered as the fourth paradigm of science, allowing researchers to extract insights from both scientific instruments and computational simulations. A rich ecosystem of hardware and software has emerged for big-data analytics similar to high-performance computing.
Article
Full-text available
Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
Article
Full-text available
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Conference Paper
Full-text available
Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards a few items. Implementing graph traversal, statistics and machine learning algorithms on such data in a scalable manner is quite challenging. As a result, several graph analytics frameworks (GraphLab, CombBLAS, Giraph, SociaLite and Galois among others) have been developed, each offering a solution with different programming models and targeted at different users. Unfortunately, the "Ninja performance gap" between optimized code and most of these frameworks is very large (2-30X for most frameworks and up to 560X for Giraph) for common graph algorithms, and moreover varies widely with algorithms. This makes the end-users' choice of graph framework dependent not only on ease of use but also on performance. In this work, we offer a quantitative roadmap for improving the performance of all these frameworks and bridging the "ninja gap". We first present hand-optimized baselines that get performance close to hardware limits and higher than any published performance figure for these graph algorithms. We characterize the performance of both this native implementation as well as popular graph frameworks on a variety of algorithms. This study helps end-users delineate bottlenecks arising from the algorithms themselves vs. programming model abstractions vs. the framework implementations. Further, by analyzing the system-level behavior of these frameworks, we obtain bottlenecks that are agnostic to specific algorithms. We recommend changes to alleviate these bottlenecks (and implement some of them) and reduce the performance gap with respect to native code. These changes will enable end-users to choose frameworks based mostly on ease of use.
Article
Full-text available
Neuronal computation is energetically expensive. Consequently, the brain's limited energy supply imposes constraints on its information processing capability. Most brain energy is used on synaptic transmission, making it important to understand how energy is provided to and used by synapses. We describe how information transmission through presynaptic terminals and postsynaptic spines is related to their energy consumption, assess which mechanisms normally ensure an adequate supply of ATP to these structures, consider the influence of synaptic plasticity and changing brain state on synaptic energy use, and explain how disruption of the energy supply to synapses leads to neuropathology.
Conference Paper
Full-text available
We present several modifications of the original recurrent neural net work language model (RNN LM). While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.
Article
Full-text available
Power has become the primary design constraint for chip designers today. While Moore's law continues to provide additional transistors, power budgets have begun to prohibit those devices from actually being used. To reduce energy consumption, voltage scaling techniques have proved a popular technique with subthreshold design representing the endpoint of voltage scaling. Although it is extremely energy efficient, subthreshold design has been relegated to niche markets due to its major performance penalties. This paper defines and explores near-threshold computing (NTC), a design space where the supply voltage is approximately equal to the threshold voltage of the transistors. This region retains much of the energy savings of subthreshold operation with more favorable performance and variability characteristics. This makes it applicable to a broad range of power-constrained computing segments from sensors to high performance servers. This paper explores the barriers to the widespread adoption of NTC and describes current work aimed at overcoming these obstacles.
Article
Full-text available
We present an experiment in which a one-bit memory is constructed, using a system of a single colloidal particle trapped in a modulated double-well potential. We measure the amount of heat dissipated to erase a bit and we establish that in the limit of long erasure cycles the mean dissipated heat saturates at the Landauer bound, i.e. the minimal quantity of heat necessarily produced to delete a classical bit of information. This result demonstrates the intimate link between information theory and thermodynamics. To stress this connection we also show that a detailed Jarzynski equality is verified, retrieving the Landauer's bound independently of the work done on the system. The experimental details are presented and the experimental errors carefully discussed
Conference Paper
Full-text available
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.
Conference Paper
Full-text available
Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.
Article
Full-text available
IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show, Jeopardy. The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After three years of intense research and development by a core team of about 20 researchers, Watson is performing at human expert levels in terms of precision, confidence, and speed at the Jeopardy quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that can be used as a foundation for combining, deploying, evaluating, and advancing a wide range of algorithmic techniques to rapidly advance the field of question answering (QA).
Article
Full-text available
D stacked systems reduce communication delay in multiprocessor system-on-chips (MPSoCs) and enable heteroge- neous integration of cores, memories, sensors, and RF devices. However, vertical integration of layers exacerbates temperature- induced problems such as reliability degradation. Liquid cooling is a highly efficient solution to overcome the accelerated thermal problems in 3-D architectures; however, it brings new challenges in modeling and run-time management for such 3-D MPSoCs with multitier liquid cooling. This paper proposes a novel design- time/run-time thermal management strategy. The design-time phase involves a rigorous thermal impact analysis of various thermal control variables. We then utilize this analysis to design a run-time fuzzy controller for improving energy efficiency in 3-D MPSoCs through liquid cooling management and dynamic volt- age and frequency scaling (DVFS). The fuzzy controller adjusts the liquid flow rate dynamically to match the cooling demand of the chip for preventing overcooling and for maintaining a stable thermal profile. The DVFS decisions increase chip-level energy savings and help balance the temperature across the system. Our controller is used in conjunction with temperature- aware load balancing and dynamic power management strategies. Experimental results on 2-tier and 4-tier 3-D MPSoCs show that our strategy prevents the system from exceeding the given threshold temperature. At the same time, we reduce cooling energy by up to 63% and system-level energy by up to 21% in comparison to statically setting a flow rate setting to handle worst-case temperatures.
Article
Full-text available
This paper presents the current state of understanding of the factors that limit the continued scaling of Si complementary metal-oxide-semiconductor (CMOS) technology and provides an analysis of the ways in which application-related considerations enter into the determination of these limits. The physical origins of these limits are primarily in the tunneling currents, which leak through the various barriers in a MOS field-effect transistor (MOSFET) when it becomes very small, and in the thermally generated subthreshold currents. The dependence of these leakages on MOSFET geometry and structure is discussed along with design criteria for minimizing short-channel effects and other issues related to scaling. Scaling limits due to these leakage currents arise from application constraints related to power consumption and circuit functionality. We describe how these constraints work out for some of the most important application classes: dynamic random access memory (DRAM), static random access memory (SRAM), low-power portable devices, and moderate and high-performance CMOS logic. As a summary, we provide a table of our estimates of the scaling limits for various applications and device types. The end result is that there is no single end point for scaling, but that instead there are many end points, each optimally adapted to its particular applications
Article
Full-text available
A hardware prefetching mechanism for cache memories named Speculative Prefetching is proposed. This scheme detects regular accesses issued by a load/store instruction and prefetches the corresponding data. The scheme requires no software add-on, and in some cases it is more powerful than software techniques for identifying regular accesses. The tradeoffs related to its hardware implementation are extensively discussed in order to finely tune the mechanism. Experiments show that average memory access time of regular codes is brought within 10% of optimum for processors with usual issue rates, while performance of irregular codes is little reduced though never degraded. The scheme performance is discussed over a wide range of parameters. Keywords: cache, hardware prefetch, numerical codes, memory latency. 1 Introduction The memory latency observed by current processors is high whether they are superpipelined (fast processor clock), have no secondary caches or belong to a multiprocesso...
Article
Full-text available
A large number of memory accesses in memory-bound applications are irregular, such as pointer dereferences, and can be effectively targeted by thread-based prefetching techniques like Speculative Precomputation. These techniques execute instructions, for example on an available SMT thread context, that have been extracted directly from the program they are trying to accelerate. Proposed techniques typically require manual user intervention to extract and optimize instruction sequences. This paper proposes Dynamic Speculative Precomputation, which performs all necessary instruction analysis, extraction, and optimization through the use of back-end instruction analysis hardware, located off the processor's critical path. For a set of memory limited benchmarks an average speedup of 14% is achieved when constructing simple p-slices, and this gain grows to 33% when making use of aggressive optimizations. 1.
Article
The world’s appetite for analyzing massive amounts of structured and unstructured data has grown dramatically. The computational demands of these abundant-data applications, such as deep learning, far exceed the capabilities of today’s computing systems and are unlikely to be met with isolated improvements in transistor or memory technologies, or integrated circuit architectures alone. To achieve unprecedented functionality, speed, and energy efficiency, one must create transformative nanosystems whose architectures are based on the salient properties of the underlying nanotechnologies. Our Nano-Engineered Computing Systems Technology (N3XT) approach makes such nanosystems possible through new computing system architectures leveraging emerging device (logic and memory) nanotechnologies and their dense 3-D integration with fine-grained connectivity to immerse computing in memory and new logic devices (such as carbon nanotube field-effect transistors for implementing high-speed and low-energy logic circuits) as well as high-density nonvolatile memory (such as resistive memory), and amenable to ultradense (monolithic) 3-D integration of thin layers of logic and memory devices that are fabricated at low temperature. In addition, we explore the use of several device and integration technologies in the N3XT beyond the specific ones mentioned earlier that are also used in our main nanosystem prototypes. We also present an efficient resiliency technique to overcome endurance challenges in certain resistive memory technologies. N3XT hardware prototypes demonstrate the practicality of our architectures. We evaluate the benefits of the N3XT using a simulation framework calibrated using experimental measurements. System-level energy-delay product of common implementations of abundant-data workloads improves by three orders of magnitude in the N3XT compared with conventional architectures. These improvements impact a broad range of application workloads and architecture configurations, from embedded systems to the cloud.
Article
Data movement between processing and memory is the root cause of the limited performance and energy efficiency in modern von Neumann systems. To overcome the data-movement bottleneck, we present the memristive Memory Processing Unit (mMPU)-a real processing-in-memory system in which the computation is done directly in the memory cells, thus eliminating the necessity for data transfer. Furthermore, with its enormous inner parallelism, this system is ideal for data-intensive applications that are based on single instruction, multiple data (SIMD)-providing high throughput and energy-efficiency.
Article
Cyber Physical Systems (CPS) are networked systems of cyber (computation and communication) and physical (sensors and actuators) components that interact in a feedback loop with the possible help of human intervention, interaction and utilization. These systems will empower our critical infrastructure and have the potential to significantly impact our daily lives as they form the basis for emerging and future smart services. On the other hand, the increased use of CPS brings more threats that could have major consequences for users. Security problems in this area have become a global issue, thus, designing robust, secure and efficient CPS is an active area of research. Security issues are not new, but advances in technology make it necessary to develop new approaches to protect data against undesired consequences. New threats will continue to be exploited and cyber-attacks will continue to emerge, hence the need for new methods to protect CPS. This paper presents an analysis of the security issues at the various layers of CPS architecture, risk assessment and techniques for securing CPS. Finally, challenges, areas for future research and possible solutions are presented and discussed.
Article
The insights contained in Gordon Moore's now famous 1965 and 1975 papers have broadly guided the development of semiconductor electronics for over 50 years. However, the field-effect transistor is approaching some physical limits to further miniaturization, and the associated rising costs and reduced return on investment appear to be slowing the pace of development. Far from signaling an end to progress, this gradual "end of Moore's law" will open a new era in information technology as the focus of research and development shifts from miniaturization of long-established technologies to the coordinated introduction of new devices, new integration technologies, and new architectures for computing.
Conference Paper
This paper proposes a novel approach to modeling of gate level timing errors during high-level instruction set simulation. In contrast to conventional, purely random fault injection , our physically motivated approach directly relates to the underlying circuit structure, hence allowing for a significantly more detailed characterization of application performance under scaled frequency / voltage (including supply noise). The model uses gate level timing statistics extracted by dynamic timing analysis from the post place & route netlist of a general-purpose processor to perform instruction-aware fault injections. We employ a 28 nm OpenRISC core as a case study, to demonstrate how statistical fault injection provides a more accurate and realistic analysis of power vs. error performance.
Article
Biological in formation-processing systems operate on completely different principles from those with which most engineers are familiar. For many problems, particularly those in which the input data are ill-conditioned and the computation can be specified in a relative manner, biological solutions are many orders of magnitude more effective than those we have been able to implement using digital methods. This advantage can be attributed principally to the use of elementary physical phenomena as computational primitives, and to the representation of information by the relative values of analog signals, rather than by the absolute values of digital signals. This approach requires adaptive techniques to mitigate the effects of component differences. This kind of adaptation leads naturally to systems that learn about their environment. Large-scale adaptive analog systems are more robust to component degradation and failure than are more conventional systems, and they use far less power. For this reason, adaptive analog technology can be expected to utilize the full potential of wafer-scale silicon fabrication.
Article
Exploring the inherent technical challenges in realizing the potential of Big Data.
Article
Nature - the world's best science and medicine on your desktop
Conference Paper
Our challenge is clear: The drive for performance and the end of voltage scaling have made power, and not the number of transistors, the principal factor limiting further improvements in computing performance. Continuing to scale compute performance will require the creation and effective use of new specialized compute engines, and will require the participation of application experts to be successful. If we play our cards right, and develop the tools that allow our customers to become part of the design process, we will create a new wave of innovative and efficient computing devices.
Conference Paper
Technology scaling has resulted in smaller and faster transistors in successive technology generations. However, transistor power consumption no longer scales commensurately with integration density and, consequently, it is projected that in future technology nodes it will only be possible to simultaneously power on a fraction of cores on a multi-core chip in order to stay within the power budget. The part of the chip that is powered off is referred to as dark silicon and brings new challenges as well as opportunities for the design community, particularly in the context of the interaction of dark silicon with thermal, reliability and variability concerns. In this perspectives paper we describe these new challenges and opportunities, and provide preliminary experimental evidence in their support.
Article
It is argued that computing machines inevitably involve devices which perform logical functions that do not have a single-valued inverse. This logical irreversibility is associated with physical irreversibility and requires a minimal heat generation, per machine cycle, typically of the order of kT for each irreversible function. This dissipation serves the purpose of standardizing signals and making them independent of their exact logical history. Two simple, but representative, models of bistable devices are subjected to a more detailed analysis of switching kinetics to yield the relationship between speed and energy dissipation, and to estimate the effects of errors induced by thermal fluctuations.
Article
The dream of replacing rotating mechanical storage, the disk drive, with solid-state, nonvolatile RAM may become a reality in the near future. Approximately ten new technologies—collectively called storage-class memory (SCM)—are currently under development and promise to be fast, inexpensive, and power efficient. Using SCM as a disk drive replacement, storage system products will have random and sequential I/O performance that is orders of magnitude better than that of comparable disk-based systems and require much less space and power in the data center. In this paper, we extrapolate disk and SCM technology trends to 2020 and analyze the impact on storage systems. The result is a 100- to 1,000-fold advantage for SCM in terms of the data center space and power required.
Article
The future of integrated electronics is the future of electronics itself. Integrated circuits will lead to such wonders as home computers, automatic controls for automobiles, and personal portable communications equipment. But the biggest potential lies in the production of large systems. In telephone communications, integrated circuits in digital filters will separate channels on multiplex equipment. Integrated circuits will also switch telephone circuits and perform data processing. In addition, the improved reliability made possible by integrated circuits will allow the construction of larger processing units. Machines similar to those in existence today will be built at lower costs and with faster turnaround.
Article
It is shown that for many problems, particularly those in which the input data are ill-conditioned and the computation can be specified in a relative manner, biological solutions are many orders of magnitude more effective than those using digital methods. This advantage can be attributed principally to the use of elementary physical phenomena as computational primitives, and to the representation of information by the relative values of analog signals rather than by the absolute values of digital signals. This approach requires adaptive techniques to mitigate the effects of component differences. This kind of adaptation leads naturally to systems that learn about their environment. Large-scale adaptive analog systems are more robust to component degradation and failure than are more conventional systems, and they use far less power. For this reason, adaptive analog technology can be expected to utilize the full potential of wafer-scale silicon fabrication
Article
The tools of molecular biology were used to solve an instance of the directed Hamiltonian path problem. A small graph was encoded in molecules of DNA, and the "operations" of the computation were performed with standard protocols and enzymes. This experiment demonstrates the feasibility of carrying out computations at the molecular level.
Article
This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by simulating the performance of a research processor based on the Itanium TM ISA supporting Simultaneous Multithreading. Two primary forms of Speculative Precomputation are evaluated. If only the non-speculative thread spawns speculative threads, performance gains of up to 30% are achieved when assuming ideal hardware. However, this speedup drops considerably with more realistic hardware assumptions. Permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation, overcoming this limitation. Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%. 1.
The programmable logic-in-memory (PLIM) computer, in Design, Automation Test in Europe Conference Exhibition (DATE
  • P Gaillardon
  • L Amar
  • A Siemon
  • E Linn
  • R Waser
  • A Chattopadhyay
  • G De Micheli
The bitlet model: defining a litmus test for the bitwise processing-in-memory paradigm
  • K Korgaonkar
  • R Ronen
  • A Chattopadhyay
  • S Kvatinsky
1.1 computing’s energy problem (and what we can do about it), in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC
  • M Horowitz
The EDA challenges in the dark silicon era: temperature, reliability, and variability perspectives
  • M Shafique
  • S Garg
  • J Henkel
  • D Marculescu