Intel
  • Santa Clara, California, United States
Recent publications
This work presents a fully integrated 140-GHz transmitter (TX) achieving a data rate of 160 Gb/s with $\sim$ 1-pJ/b efficiency in the 22-nm Intel FinFET technology. The TX leverages a wideband radio frequency digital to analog converter (RF-DAC) architecture with embedded 4:1 multiplexer, and it is integrated with a sub-sampling quadrature phase-locked loop (PLL), frequency tripler, local oscillator (LO) buffers, wideband two-stage power amplifier (PA), and on-chip SRAM/pseudorandom binary sequence (PRBS) for high-speed data generation. The TX achieves 120/160-Gb/s 16 quadratic-amplitude modulation (QAM) with $-$ 19/ $-$ 17-dB error vector magnitude (EVM) at an output power of $+$ 1.5/ $+$ 0.8 dBm.
Non-Volatile Memory (NVM) has emerged as an alternative to next-generation main memories. Although many tree indices have been proposed for NVM, they generally use B+-tree-like structures. To further improve the performance of NVM-aware indices, we consider integrating learned indexes into NVM. The challenges of such an integration are two fold: (1) existing NVM indices rely on small nodes to accelerate insertions with crash consistency, but learned indices use huge nodes to obtain a flat structure. (2) the node structure of learned indices is not NVM friendly, meaning that accessing a learned node will cause multiple NVM block misses. Thus, in this paper, we propose a new persistent learned index called PLIN. The novelty of PLIN lies in four aspects: an NVM-aware data placement strategy, locally unordered and globally ordered leaf nodes, a model copy mechanism, and a hierarchical insertion strategy. In addition, PLIN is proposed for the NVM-only architecture, which can support instant recovery. We also present optimistic concurrency control and fine-grained locking mechanisms to make PLIN scalable to concurrent requests. We conduct experiments on real persistent memory with various workloads and compare PLIN with APEX, PACtree, ROART, TLBtree, and Fast&Fair. The results show that PLIN achieves 2.08x higher insertion performance and 4.42x higher query performance than its competitors on average. Meanwhile, PLIN only needs ~30 μs to recover from a system crash.
As part of the International Conference on Very Large Data Bases (VLDB) 2021 / Proceedings of the VLDB Endowment Volume 14, a new Research Track category named Scalable Data Science (SDS) was launched [2, 6]. The goal of SDS is to attract cutting-edge and impactful real-world work in the scalable data science arena to enhance the impact and visibility of the VLDB community on data science practice, spur new technical connections, and inspire new follow-on research. The inaugural year proved to be successful, with numerous interesting papers from a wide cross section of both industry and academia, spanning several data science topics, and originating from several countries around the world. In this report, we reflect on the inaugural year of SDS with some statistics on both submissions and accepted papers, SDS invited talks, and our observations, lessons, and tips as inaugural Associate Editors for SDS. We hope this article is helpful to future authors, reviewers, and organizers of SDS, as well as other interested members of the wider database / data management community and beyond.
Recently, brain networks have been widely adopted to study brain dynamics, brain development, and brain diseases. Graph representation learning techniques on brain functional networks can facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. However, current graph learning techniques have several issues on brain network mining. First, most current graph learning models are designed for unsigned graph, which hinders the analysis of many signed network data (e.g., brain functional networks). Meanwhile, the insufficiency of brain network data limits the model performance on clinical phenotypes’ predictions. Moreover, few of the current graph learning models are interpretable, which may not be capable of providing biological insights for model outcomes. Here, we propose an interpretable hierarchical signed graph representation learning (HSGPL) model to extract graph-level representations from brain functional networks, which can be used for different prediction tasks. To further improve the model performance, we also propose a new strategy to augment functional brain network data for contrastive learning. We evaluate this framework on different classification and regression tasks using data from human connectome project (HCP) and open access series of imaging studies (OASIS). Our results from extensive experiments demonstrate the superiority of the proposed model compared with several state-of-the-art techniques. In addition, we use graph saliency maps, derived from these prediction tasks, to demonstrate detection and interpretation of phenotypic biomarkers.
This work demonstrates the atomic layer deposition (ALD) of Sb2 Te3 /GeTe superlattice (SL) film on planar and vertical sidewall areas containing TiN metal and SiO2 insulator. The peculiar chemical affinity of the ALD precursor to the substrate surface and the two-dimensional nature of the Sb2 Te3 enabled the growth of an in-situ crystallized SL film with a preferred orientation. The SL film showed a reduced reset current of ∼ 1/7 of the randomly oriented Ge2 Sb2 Te5 alloy. The reset switching was induced by the transition from the SL to the (111)-oriented face-centered-cubic (FCC) Ge2 Sb2 Te5 alloy and subsequent melt-quenching-free amorphization. The in-plane compressive stress, induced by the SL-to-FCC structural transition, enhanced the electromigration of Ge along the [111] direction of FCC structure, which enabled such a significant improvement. Set operation switched the amorphous to the (111)-oriented FCC structure. This article is protected by copyright. All rights reserved.
The next decade will usher in a new era of technological innovation where compute, communications, and intelligence will converge. The number of connected devices is estimated to reach 500 billion by 2030, which is 59× larger than the expected world population [1] . Instead of humans dominating the use of wireless networks, objects or things will become the dominant users. Networks will incorporate intelligent compute to enable new types of applications such as cyberphysical worlds that replicate reality. These applications will require distributed intelligent compute platforms, as processing terabytes of data at a single edge device will be near impossible. Future 6G systems are currently being defined to meet these needs, as shown in Figure 1 , and the essential components of all 6G systems will be intelligence and security [2] . These new wireless systems will need to leverage a distributed compute network that is part of the wireless network so that the edge device can employ a high-performance compute platform that is in close proximity to itself. Such a network device must incorporate flexibility and programmability at low latency while also supporting network and connectivity functions. Heterogeneous integration in 2.5D and 3D can efficiently enhance existing compute platforms such as processors, application-specific integrated circuits, and field programmable gate arrays (FPGAs) to address emerging applications such as intelligent compute-communicate systems at lower cost with faster time to market. This article presents an overview of 2.5D and 3D heterogeneous technologies and how they can be leveraged to create new intelligent-compute-communicate platforms to enable emerging applications
This work presents a white-box modeling of the electromagnetic (EM) leakage from an integrated circuit (IC) to develop EM side-channel analysis (SCA)-aware design techniques. A new digital library cell layout design technique is proposed to minimize the EM leakage and is evaluated using a high-frequency structure simulator (HFSS)-based framework. Backed by our physics-based understanding of EM radiation, the proposed double-row power grid-based digital cell layout design shows $>5\times $ reduction in the EM SCA leakage compared to the traditional digital logic gate layout design. Furthermore, exploiting the magneto-quasistatic (MQS) regime of operation of the EM leakage from the CMOS circuits, the HFSS-based framework is utilized to develop a pre-silicon (Si) EM SCA evaluation technique to assess the vulnerability of cryptographic implementations against such attacks during the design phase itself.
We present a dual-band transmitarray comprising ultra-thin polarization-rotating (PR) spatial phase shifters. The PR elements provide +90° and -90° polarization rotation independently in two different frequency bands, used as dual-band 1-bit spatial phase shifters. Independent polarization rotation in the two operating frequency bands is realized by interleaving high-band and low-band features that include dipoles, strips and slots printed on three metallic layers with ultra-thin inter-layer spacings. The concept is generally applicable to any frequency band where transmitarrays are practical. To illustrate the concept, we simulated and tuned a polarization-rotating unit cell to achieve 1-bit phase shifts in two operating frequency bands at X- and Ku-bands, with center frequencies at 9.5 GHz and 14.5 GHz, respectively. Subsequently, a dual-band transmitarray was designed using the proposed unit cell to provide beam collimation at broadside directions at both operating frequency bands. The proposed ultra-thin spatial phase shifter results in a transmitarray with an overall thickness of 0.07λ<sub>LB</sub>, where λ<sub>LB</sub> is the wavelength at the center frequency of the lower band. Measurement results of a fabricated prototype of the transmitarray show maximum gains of 25.2 dBi and 26.2 dBi as well as 3-dB gain bandwidths of 26% and 6.2% for the lower and higher bands, respectively.
In this letter, we endeavor to curb the adverse effects of mutual coupling between two-slot antennas (operating at C-band frequencies), situated at a subwavelength separation. The effects of mutual coupling become extremely apparent at such close proximity and leads to the deterioration of the radiation aspects of each antenna. To mitigate these detrimental effects, we exploit the concept of mantle cloaking for the design of specialized cloaks, consisting of elliptical dielectric regions integrated with capacitive metallic strips. It is demonstrated through simulation results that by enclosing each radiating edge of the individual slots with these uniquely designed metasurfaces, the slot antennas are decoupled in the near field. Moreover, the metasurface cloaks also facilitate restoration of the far field radiation properties of the slots. In this regard, our cloak design ensures that the slot antennas do not sense the presence of each other, enabling each slot antenna to operate individually, as if they were isolated.
The performance of nanoscale semiconductor technologies has become susceptible to high temperatures and aging phenomena. While guard-bands have conventionally been used to combat degradation-induced timing violations, approximations have recently been leveraged to compensate for degradations in lieu of adding timing guard-bands, without a loss in performance. However, only simple approximation techniques such as truncation have been considered in prior work. In this paper, a wide range of approximate arithmetic circuits including adders and multipliers using various sophisticated approximation techniques are investigated to cope with aging- and temperature-induced degradations. To this end, approximate circuits are first characterized for their delay increase under degradations. With this, we then determine the approximation level required to compensate for guard-bands under different degradations. Degradation-aware logic synthesis results show that the simple use of truncated arithmetic circuits leads to a higher quality loss compared to using other approximate circuits. However, a truncated multiplier has the lowest error distance towards a reliable operation in 10 years. The approximate multipliers with configurable error recovery are most suitable when the level of degradation is higher, e.g., at a temperature of 70 °C. The characterization of degradation at the circuit level is then used for design exploration at the architecture level without the need for further gate-level simulations. For three different image processing applications, experimental results show that guard-bands can be mitigated while maintaining an output result with a high visual quality.
In this article, the influence of the temperature instability of resistive memory switching on potential neuromorphic computing applications is extensively studied using an Intel TaO $_{{\textit{x}}}$ -based analog-type memristor as a synaptic weight modulator in a neural network. Evaluation results show that the effect of ambient temperature during training and interference can degrade the neural network’s accuracy due to inefficient weight updates and inevitable resistance or conductance drifting. Our results provide additional insights into device-level physical models and simple circuit-level design guidance for potential hardware-based neuromorphic computing applications.
This paper explores the complex relationship between intellectual property (IP) and the transdisciplinary collaborative design (co-design) of new digital technologies for agriculture (AgTech). More specifically, it explores how prioritizing the capturing of IP as a central researcher responsibility can cause disruptions to research relationships and project outcomes. We argue that boundary-making processes associated with IP create a particular context through which responsibility can, and must, be located and cultivated by researchers working within transdisciplinary collaborations. We draw from interview data and situated IP practices from a transdisciplinary co-design project in Aotearoa New Zealand to illustrate how IP is a fluid boundary-requiring-and-producing object that impels researchers into its management, and produces tensions that need to be noticed and skillfully navigated within research relations. We propose located response-ability as a conceptual tool and practice to reposition IP within the relations that make up a transdisciplinary co-design project, as opposed to prioritizing IP by default without recognizing its possible impacts on collaborative relations and other project aims and accountabilities. This can support researchers practicing responsible innovation in making everyday decisions on how to protect potential IP without disrupting the collaborative relations that make the creation of potential IP possible, and the existence of protected IP relevant and beneficial to project collaborators and wider societal actors. This may help to ensure that societal benefits can be generated, and positive science–society relationships prioritized and preserved, in the design of new AgTech.
The increasing amount of data and the growing complexity of problems has resulted in an ever-growing reliance on cloud computing. However, many applications, most notably in healthcare, finance or defense, demand security and privacy which today’s solutions cannot fully address. Fully homomorphic encryption (FHE) elevates the bar of today’s solutions by adding confidentiality of data during processing. It allows computation on fully encrypted data without the need for decryption, thus fully preserving privacy. To enable processing encrypted data at usable levels of classic security, e.g., 128-bit, the encryption procedure introduces noticeable data size expansion - the ciphertext is much bigger than the native aggregate of native data types. In this paper, we present MemFHE which is the first accelerator of both client and server for the latest Ring-GSW (Gentry, Sahai, and Waters [17]) based homomorphic encryption schemes using Processing In Memory (PIM). PIM alleviates the data movement issues with large FHE encrypted data, while providing in-situ execution and extensive parallelism needed for FHE’s polynomial operations. While the client-PIM can homomorphically encrypt and decrypt data, the server-PIM can process homomorphically encrypted data without decryption. MemFHE’s server-PIM is pipelined and is designed to provide flexible bootstrapping, allowing two encryption techniques and various FHE security-levels based on the application requirements. We evaluate MemFHE for various security-levels and compare it with state-of-the-art CPU implementations for Ring-GSW based FHE. MemFHE is up to 20 k × (265 ×) faster than CPU (GPU) for FHE arithmetic operations and provides on average 2007 × higher throughput than [36] while implementing neural networks with FHE.
Demonstrated Spectral Efficiency (SE) gain by In Band Full Duplex (IBFD) makes it an attractive radio technology. Owing to inherent complexities in IBFD, Hybrid IBFD Cellular Network (HICN) is opted in practice because it limits IBFD capability only to Base Station and continues with legacy Half Duplex (HD) User Equipments (UEs). Since sharing frequencies within UE groups maximizes sum-SE in a HICN, we formulate a grouped UEs maximization problem for achieving maximum frequency sharing. Unlike heuristic search methods developed in literature, in this paper, we build a mathematical framework for UE grouping. Three grouping algorithms are analytically derived in closed-forms with varying performance and time complexity trade-offs. The optimal algorithm provides benchmarking on these performance measures. Using structural advantages of a HICN, we derive reduced search-optimal algorithm that reduces time complexity while guaranteeing optimal grouping, whereas near-optimal algorithm achieves close-to-optimal grouping with substantially reduced time complexity for real-time cellular systems. We further introduce the concept of ineligible UEs, which uniformly reduces time complexity of all three algorithms without impacting their grouping performance. Extensive simulations reveal that the proposed near-optimal algorithm achieved a median sum-SE of 93.5% of theoretically doubling maximum over legacy HD systems, thus emerging as a prospective solution.
Modern electronic design automation (EDA) flows depend on both implementation and signoff tools to perform timing-constrained power optimization (TCPO) through Engineering Change Orders (ECOs), which involve gate sizing and threshold-voltage ( V th )-assignment of standard cells. However, the signoff ECO optimization is highly time-consuming, and the power improvement is hard to predict in advance. Ever since the industrial benchmarks released by the ISPD-2012 gate-sizing contest, active research has been conducted extensively to improve the optimization process. Nonetheless, previous works were mostly based on heuristics or analytical methods whose timing models were oversimplified and lacked of formal validations from commercial signoff tools. In this paper, we propose ECO-GNN, a transferable graph-learning-based framework, which harnesses graph neural networks (GNNs) to perform commercial-quality signoff power optimization through discrete V th -assignment. One of the highlights of our framework is that it generates tool-accurate optimization results instantly on unseen netlists that are not utilized in the training process. Furthermore, we propose a subgraph approximation technique to improve training and inferencing time of the proposed GNN model. We show that design instances with non-overlapping subgraphs can be optimized in parallel so as to improve the inference time of the learning-based model. Finally, we implement a GNN-based explanation method to interpret the optimization results achieved by our framework. Experimental results on 14 industrial designs, including a RISC-V-based multi-core system and the renowned ISPD-2012 benchmarks, demonstrate that our framework achieves up to 14X runtime improvement with similar signoff power optimization quality compared with Synopsys PrimeTime , an industry-leading signoff tool.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
4,448 members
Alexander Nadel
  • Intel, Haifa
Jeongnim Kim
  • Data Platform Group
Rahul Khanna
  • Software and Solutions Group (SSG), Hillsboro
Information
Address
2200 Mission College Blvd, 95054, Santa Clara, California, United States