Vijaykrishnan Narayanan

Vijaykrishnan Narayanan
Pennsylvania State University | Penn State · Department of Computer Science and Engineering

About

712
Publications
71,730
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,888
Citations
Citations since 2017
120 Research Items
5528 Citations
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000
201720182019202020212022202302004006008001,000

Publications

Publications (712)
Article
Full-text available
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the sp...
Preprint
Full-text available
The recent progress in quantum computing and space exploration led to a surge in interest in cryogenic electronics. Superconducting devices such as Josephson junction, Josephson field effect transistor, cryotron, and superconducting quantum interference device (SQUID) are traditionally used to build cryogenic logic gates. However, due to the superc...
Article
To fully exploit the ferroelectric field effect transistor (FeFET) as compact embedded nonvolatile memory for various computing and storage applications, it is desirable to use a single FeFET (1T) as a unit cell and arrange the cells into an array. However, many write mechanisms for an 1T FeFET array reported in the literature are yet to be validat...
Preprint
Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple conf...
Article
As one type of associative memory, content-addressable memory (CAM) has become a critical component in several applications, including caches, routers, and pattern matching. Compared with the conventional CAM that could only deliver a “matched or not-matched” result, emerging multilevel CAM (ML-CAM) is capable of delivering “the degree of match” wi...
Article
Full-text available
We consider the problem of computing the $k$ -means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. $k$ -Means computation is fundamental to many data analytics, and the capability of computing provably accurate $k$ -mea...
Preprint
Full-text available
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first Fe...
Preprint
Full-text available
Intellectual property (IP) piracy has become a non-negligible problem as the integrated circuit (IC) production supply chain is becoming increasingly globalized and separated that enables attacks by potentially untrusted attackers. Logic locking is a widely adopted method to lock the circuit module with a key and prevent hackers from cracking it. T...
Preprint
Full-text available
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the sp...
Preprint
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the sp...
Preprint
Full-text available
Compute-in-memory (CiM) is a promising approach to improving the computing speed and energy efficiency in dataintensive applications. Beyond existing CiM techniques of bitwise logic-in-memory operations and dot product operations, this paper extends the CiM paradigm with FAST, a new shift-based inmemory computation technique to handle high-concurre...
Article
Full-text available
Existing circuit camouflaging techniques to prevent reverse engineering increase circuit-complexity with significant area, energy, and delay penalty. In this paper, we propose an efficient hardware encryption technique with minimal complexity and overheads based on ferroelectric field-effect transistor (FeFET) active interconnects. By utilizing the...
Preprint
Full-text available
There is an increasing demand for intelligent processing on emerging ultra-low-power internet of things (IoT) devices, and recent works have shown substantial efficiency boosts by executing inference tasks directly on the IoT device (node) rather than merely transmitting sensor data. However, the computation and power demands of Deep Neural Network...
Article
Full-text available
Locating and grasping objects is a critical task in people’s daily lives. For people with visual impairments, this task can be a daily struggle. The support of augmented reality frameworks in smartphones can overcome the limitations of current object detection applications designed for people with visual impairments. We present AIGuide, a self-cont...
Article
Intelligent edge sensors that augment legacy “unintelligent” manufacturing systems provides cost-effective functional upgrades. However, the limited compute at these edge devices requires trade-offs in efficient edge-cloud partitioning and raises data privacy issues. This work explores policies for partitioning random forest approaches, which are w...
Article
This paper proposes a fully-concurrent access SRAM topology to handle high-concurrency operations on multiple rows in an SRAM array. Such high-concurrency operations are widely seen in both conventional and emerging applications where high parallelism is preferred, e.g., the table update in a database and the parallel feature update in graph comput...
Article
There is an ongoing trend to increasingly offload inference tasks, such as CNNs, to edge devices in many IoT scenarios. As energy harvesting is an attractive IoT power source, recent ReRAM-based CNN accelerators have been designed for operation on harvested energy. When addressing the instability problems of harvested energy, prior optimization tec...
Article
Conventional processors suffer from high access latency and power dissipation due to the demand for memory bandwidth for data-intensive workloads, such as machine learning and analytic. In-memory computing support for various memory technologies has provided formidable improvement in performance and energy for such workloads, alleviating the repeat...
Article
Ternary content addressable memory (TCAM) is one type of associative memory and has been widely used in caches, routers, and many other mapping-aware applications. While the conventional SRAM-based TCAM is high speed and bulky, there have been denser but slower and less reliable nonvolatile TCAMs using nonvolatile memory (NVM) devices. Meanwhile, s...
Preprint
Full-text available
Camouflaging gate techniques are typically used in hardware security to prevent reverse engineering. Layout level camouflaging by adding dummy contacts ensures some level of protection against extracting the correct netlist. Threshold voltage manipulation for multi-functional logic with identical layouts has also been introduced for functional obfu...
Article
Full-text available
We perform a simulation-based analysis on the potential of emerging ferroelectric tunnel junctions (FTJ) as a memory device for crossbar arrays. Though FTJs are promising due to their low power switching characteristics compared to other emerging technologies, the greatest challenge for FTJs is the trade-off between integration density and read per...
Preprint
Full-text available
The quest to solve hard combinatorial optimization problems efficiently -- still a longstanding challenge for traditional digital computers -- has inspired the exploration of many alternate computing models and platforms. As a case in point, oscillator networks offer a potentially promising energy efficient and scalable option. However, prior oscil...
Preprint
Full-text available
Machine/deep-learning (ML/DL) based techniques are emerging as a driving force behind many cutting-edge technologies, achieving high accuracy on computer vision workloads such as image classification and object detection. However, training these models involving large parameters is both time-consuming and energy-hogging. In this regard, several pri...
Preprint
Full-text available
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for further computational efficiency. There has been both theoretical and practical success of modern Confl...
Preprint
The cognitive system for human action and behavior has evolved into a deep learning regime, and especially the advent of Graph Convolution Networks has transformed the field in recent years. However, previous works have mainly focused on over-parameterized and complex models based on dense graph convolution networks, resulting in low efficiency in...
Article
Full-text available
Compare operation is widely used in many applications, from fundamental sorting to primitive operations in the database and AI systems. We present SRAM-based 3D-CAM circuit designs using Monolithic 3D integration process (M3D) for realizing beyond-Boolean in-memory compare operation without any area overheads. We also fabricated a processing-in-mem...
Preprint
Full-text available
We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by l...
Patent
A sense amplifier utilizes a phase transition material (PTM) in conjunction with CMOS circuits to provide a precise sensing threshold. The sense amplifier can be used in memory applications to sense states of stored bits with high accuracy and robustness. In one sense amplifier, a first diode-connected transistor has gate and drain nodes coupled to...
Article
Full-text available
Recently, Memory Augmented Neural Networks (MANN)s, a class of Deep Neural Networks (DNN)s have become prominent owing to their ability to capture the long term dependencies effectively for several Natural Language Processing (NLP) tasks. These networks augment conventional DNNs by incorporating memory and attention mechanisms external to the netwo...
Article
So-called “tagless” caches have become common as a means to deal with the vast L4 last-level caches (LLCs) enabled by increasing device density, emerging memory technologies, and advanced integration capabilities (e.g., 3-D). Tagless schemes often result in intercache entanglement between tagless cache (L4) and the cache (L3) stewarding its metadat...
Conference Paper
Full-text available
Locating and grasping objects is a critical task in people's daily lives. For people with visual impairments, this task can be a daily struggle. The support of augmented reality frameworks in smartphones has the potential to overcome the limitations of current object detection applications designed for people with visual impairments. We present AIG...
Conference Paper
Full-text available
Reducing the model size and computation costs for dedicated AI accelerator designs, neural network quantization methods have attracted momentous attention recently. Unfortunately, merely minimizing quantization loss using constant discretization causes accuracy deterioration. In this paper, we propose an iterative accuracy-driven learning framework...
Article
This brief presents the concept of one-shot refresh (OSR) for dynamic memories. The uniqueness of OSR that differs from the conventional row-by-row refresh operations, is that OSR is able to refresh all rows in the entire array by just one single refresh without the need to read out the stored data in each row. By doing so, significant energy savin...
Article
In this article, we analyze the impact of process variations of the ferroelectric film on the performance of the negative capacitance field-effect transistor (NCFET). Variations of the ferroelectric layer area (resulting from the variation of the transistor dimension sizes and the edge-effect), the ferroelectric layer thickness, the polarization, a...
Chapter
Every shift in the way our devices are connected or powered brings with it a potential for revolution in the usage and capabilities of the systems built around them. Just as the transition from wired to wireless telephones led to unprecedented changes in our communications and the shift from wall-power to battery-power transformed our expectations...
Conference Paper
Recent research shows that the multi-view system for object recognition outperforms the single-view point system. When viewpoints are added, additional communication cost and cost to deploy the viewpoints are also added. However, prior work has shown that not all of the views are useful, and poor viewpoints can be excluded. This paper explores the...
Article
Negative-capacitance FETs (NCFETs) are a promising candidate for low-power circuits with intrinsic features, e.g., the steep switching slope. Prior works have shown potential for enabling low-power digital logic and memory design with NCFETs. Yet, it is still not quite clear how to harness these new features of NCFETs for analog functionalities. Th...
Article
Cyber-physical systems in the Internet-of-Things (IoT) era increasingly need responses that are not only timely, but also intelligent. To that end, decentralized Deep Neural Network (DNN) systems have been studied for near-sensor processing to enable localized inference and global network partition in a given, limited power budget. In the real worl...
Article
Full-text available
In recent years, several designs that use in-memory processing to accelerate machine-learning inference problems have been proposed. Such designs are also a perfect fit for discrete, dynamic and distributed systems that can solve large-dimensional optimization problems using iterative algorithms. For in-memory computations, Ferroelectric Field Effe...
Conference Paper
This paper presents a reconfigurable ferroelectric FET (R-FEFET), which has a unique capability to tune its operation between volatile (logic) and non-volatile (memory) modes during run-time by dynamically modulating its hysteresis. The R-FEFET comprises of two gates with ferroelectric (FE) in both the gate stacks interacting with a common transist...
Conference Paper
As the computing power of end-point devices grows, there has been interest in developing distributed deep neural networks specifically for hierarchical inference deployments on multi-sensor systems. However, as the existing approaches rely on latent parameters trained by machine learning, it is difficult to preemptively select front-end deep featur...
Patent
A sense amplifier utilizes a phase transition material (PTM) in conjunction with CMOS circuits to provide a precise sensing threshold. The sense amplifier can be used in memory applications to sense states of stored bits with high accuracy and robustness. In one sense amplifier, a first diode-connected transistor has gate and drain nodes coupled to...
Article
Editor's note: This article explores the design of high-density, low-power, and high-speed embedded nonvolatile memory arrays exploiting the unique device characteristics of the emerging ferroelectric FETs. -Vivek De, Intel Corporation.
Article
We present a novel 3D-SRAM cells using a monolithic 3D integration technology for realizing both robustness of the cell and in-memory Boolean logic computing capability. The proposed two-layer cell designs make use of additional transistors over the SRAM layer to enable assist techniques as well as provide logic functions (such as AND/NAND, OR/NOR,...