Abu Sebastian’s research while affiliated with IBM Research - Thomas J. Watson Research Center and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (323)


On the Role of Noise in Factorizers for Disentangling Distributed Representations
  • Preprint

November 2024

·

2 Reads

Geethan Karunaratne

·

·

Abu Sebastian

·

To efficiently factorize high-dimensional distributed representations to the constituent atomic vectors, one can exploit the compute-in-superposition capabilities of vector-symbolic architectures (VSA). Such factorizers however suffer from the phenomenon of limit cycles. Applying noise during the iterative decoding is one mechanism to address this issue. In this paper, we explore ways to further relax the noise requirement by applying noise only at the time of VSA's reconstruction codebook initialization. While the need for noise during iterations proves analog in-memory computing systems to be a natural choice as an implementation media, the adequacy of initialization noise allows digital hardware to remain equally indispensable. This broadens the implementation possibilities of factorizers. Our study finds that while the best performance shifts from initialization noise to iterative noise as the number of factors increases from 2 to 4, both extend the operational capacity by at least 50 times compared to the baseline factorizer resonator networks. Our code is available at: https://github.com/IBM/in-memory-factorizer


Understanding the Growth and Properties of Sputter-Deposited Phase-Change Superlattice Films

November 2024

·

21 Reads

Simone Prili

·

·

Vara Prasad Jonnalagadda

·

[...]

·

Ghazi Sarwat Syed

Highly textured chalcogenide films have recently gained significant interest for phase-change memory applications. Several reports have highlighted that programming efficiency improves in devices featuring superlattice stacks, such as Ge2Sb2Te5/Sb2Te3. However, to be technologically relevant, these films must be deposited on foundry-scale wafers using processes compatible with back end of the line (BEOL) integration and complementary metal-oxide-semiconductor (CMOS) technology, such as, for example, sputter deposition. In this work, we present our observations on the influence of temperature, pressure, and seeding layer parameters on the sputter growth processes of superlattice films. By measuring various material properties, we construct a pseudo-phase diagram to illustrate the growth of both individual and superlattice films with different periodicities on technologically relevant substrates, namely SiO2 and carbon. These results provide important insights into the structure, intermixing and electro-optical properties of superlattice films


The Inherent Adversarial Robustness of Analog In-Memory Computing
  • Preprint
  • File available

November 2024

·

4 Reads

A key challenge for Deep Neural Network (DNN) algorithms is their vulnerability to adversarial attacks. Inherently non-deterministic compute substrates, such as those based on Analog In-Memory Computing (AIMC), have been speculated to provide significant adversarial robustness when performing DNN inference. In this paper, we experimentally validate this conjecture for the first time on an AIMC chip based on Phase Change Memory (PCM) devices. We demonstrate higher adversarial robustness against different types of adversarial attacks when implementing an image classification network. Additional robustness is also observed when performing hardware-in-the-loop attacks, for which the attacker is assumed to have full access to the hardware. A careful study of the various noise sources indicate that a combination of stochastic noise sources (both recurrent and non-recurrent) are responsible for the adversarial robustness and that their type and magnitude disproportionately effects this property. Finally, it is demonstrated, via simulations, that when a much larger transformer network is used to implement a Natural Language Processing (NLP) task, additional robustness is still observed.

Download

Kernel Approximation using Analog In-Memory Computing

November 2024

·

14 Reads

Kernel functions are vital ingredients of several machine learning algorithms, but often incur significant memory and computational costs. We introduce an approach to kernel approximation in machine learning algorithms suitable for mixed-signal Analog In-Memory Computing (AIMC) architectures. Analog In-Memory Kernel Approximation addresses the performance bottlenecks of conventional kernel-based methods by executing most operations in approximate kernel methods directly in memory. The IBM HERMES Project Chip, a state-of-the-art phase-change memory based AIMC chip, is utilized for the hardware demonstration of kernel approximation. Experimental results show that our method maintains high accuracy, with less than a 1% drop in kernel-based ridge classification benchmarks and within 1% accuracy on the Long Range Arena benchmark for kernelized attention in Transformer neural networks. Compared to traditional digital accelerators, our approach is estimated to deliver superior energy efficiency and lower power consumption. These findings highlight the potential of heterogeneous AIMC architectures to enhance the efficiency and scalability of machine learning applications.


RETRO-LI: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

October 2024

·

3 Reads

The retrieval augmented generation (RAG) system such as RETRO has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce RETRO-LI that shows retrieval can also help using a small scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that the RETRO-LI’s non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little


Figure 1. Overview of speech recognition approaches. (a) Biological speech recognition. As an example, when
Figure 2. Tuneable nonlinearity and short-term memory characterization in a DNPU at room temperature. a, Atomic-force-microscopy image of an 8-electrode DNPU and schematic measurement circuit
Figure 3. DNPU analogue feature extraction for speech recognition. a, Schematic of DNPU (nonlinear function noted as í µí±“(. )) fed with analogue time-dependent input í µí±¥(í µí±¡) (blue electrode), voltage measured at the orange electrode and constant control voltages (black electrodes). Every set of control voltages results in a unique transformed signal (green and red curves shown as examples) and forms an output channel with 10× lower sample rate compared to the raw input signal (see Methods). b, t-distributed stochastic neighbour encoding (t-SNE) visualization for the female subset of the TI-46-Word spoken digit dataset before and after preprocessing by a single, untrained DNPU (one configuration out of 32 sets of randomly chosen control voltages). The output data show that the DNPU preprocessing helps clustering utterances of the same digit, simplifying later classification. c, Comparison of the classification accuracy for linear and CNN classifier models without (green) and with (blue and red) DNPU preprocessing for 16, 32, and 64 DNPU channels. The all-hardware (DNPU with the AIMC classifier) result (red) is presented as the mean ± one standard deviation over ten inference measurements.
Figure 4. Schematic of a hybrid convolutional neural network (CNN) architecture for in-materia speech recognition. a, CNN architecture. A 64-channel DNPU convolution converts audio signals into 64-D input to the AIMC with a down-sampling rate of 10 (Details in Extended Data Figure 3). The first AIMC convolution layer has a kernel size of 8, and the rest of 3. Batch normalization, activation functions, and pooling operations are performed off-chip. b, A photograph of the IBM HERMES project chip and the architecture of the chip containing 256 × 256 synaptic unit cells, each comprising of 4 phase-change-memory (PCM) devices organized in a differential configuration, ADC/DAC arrays, and on-chip local digital processing units (LDPUs). c, Schematic representation of resource utilization for the 3-layer CNN classifier model implemented on two tiles of the AIMC chip.
In-Materia Speech Recognition

October 2024

·

56 Reads

With the rise of decentralized computing, as in the Internet of Things, autonomous driving, and personalized healthcare, it is increasingly important to process time-dependent signals at the edge efficiently: right at the place where the temporal data are collected, avoiding time-consuming, insecure, and costly communication with a centralized computing facility (or cloud). However, modern-day processors often cannot meet the restrained power and time budgets of edge systems because of intrinsic limitations imposed by their architecture (von Neumann bottleneck) or domain conversions (analogue-to-digital and time-to-frequency). Here, we propose an edge temporal-signal processor based on two in-materia computing systems for both feature extraction and classification, reaching a software-level accuracy of 96.2% for the TI-46-Word speech-recognition task. First, a nonlinear, room-temperature dopant-network-processing-unit (DNPU) layer realizes analogue, time-domain feature extraction from the raw audio signals, similar to the human cochlea. Second, an analogue in-memory computing (AIMC) chip, consisting of memristive crossbar arrays, implements a compact neural network trained on the extracted features for classification. With the DNPU feature extraction consuming 100s nW and AIMC-based classification having the potential for less than 10 fJ per multiply-accumulate operation, our findings offer a promising avenue for advancing the compactness, efficiency, and performance of heterogeneous smart edge processors through in-materia computing hardware.


Demonstration of 4-quadrant analog in-memory matrix multiplication in a single modulation

October 2024

·

16 Reads

Analog in-memory computing (AIMC) leverages the inherent physical characteristics of resistive memory devices to execute computational operations, notably matrix-vector multiplications (MVMs). However, executing MVMs using a single-phase reading scheme to reduce latency necessitates the simultaneous application of both positive and negative voltages across resistive memory devices. This degrades the accuracy of the computation due to the dependence of the device conductance on the voltage polarity. Here, we demonstrate the realization of a 4-quadrant MVM in a single modulation by developing analog and digital calibration procedures to mitigate the conductance polarity dependence, fully implemented on a multi-core AIMC chip based on phase-change memory. With this approach, we experimentally demonstrate accurate neural network inference and similarity search tasks using one or multiple cores of the chip, at 4 times higher MVM throughput and energy efficiency than the conventional four-phase reading scheme.


Roadmap to neuromorphic computing with emerging technologies

October 2024

·

492 Reads

·

4 Citations


Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization

September 2024

·

3 Reads

The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.



Citations (62)


... Recently, neuromorphic computing has emerged as a promising domain for addressing the limitations of traditional computing architectures. This approach encourages the development of brain-inspired algorithms, methods, and applications, focusing on both hardware and software advancements [15,16]. Neuromorphic computing technologies are particularly valuable for their potential to enhance computational efficiency and scalability, with characteristics that make them attractive as future computing paradigms. ...

Reference:

PySpice-Simulated In Situ Learning with Memristor Emulation for Single-Layer Spiking Neural Networks
Roadmap to neuromorphic computing with emerging technologies

... However, with the presence of the convergence detection circuit detailed in Appendix 5.2, the need to resonate is not essential. Similarly, codebooks that are not bipolar in nature [30] can be perturbed using an appropriate random function. ...

Factorizers for distributed sparse block codes

... Besides, other applications may benefit from a structured product representation, e.g., representing birds as a product of attributes in the CUB dataset [57]. Indeed, high-dimensional distributed representations have already been proven to be helpful when representing classes as a superposition of attribute vectors in the zero-shot setting [48]. Representing the combination of attributes in a product space may further improve the decoding efficiency. ...

Zero-Shot Classification Using Hyperdimensional Computing
  • Citing Conference Paper
  • March 2024

... While HWA training ensures that models can accommodate these specific constraints, any changes to the hardware post-deployment can create a mismatch between the environment in which the model was trained and the actual hardware conditions during inference. This mismatch can lead to a noticeable degradation in performance [14]. Beyond hardware adaptation, models deployed on AIMC hardware may also need to be adapted to new data or tasks based on user needs. ...

Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training
  • Citing Conference Paper
  • May 2024

... The traditional von Neumann computing architectures, which separate processing and memory units, do not scale well for such data-driven workloads. These architectures incur larger latency and increased energy consumption when fetching the data between the CPU and memory, thereby limiting the computational efficiency [1,2]. ...

Neural architecture search for in-memory computing-based deep learning accelerators
  • Citing Article
  • May 2024

... IMC can be realized using charge-based memory devices such as static random access memory (SRAM) and dynamic random-access memory (DRAM) as well as non-volatile resistive memory devices such as phase-change memory (PCM) and resistive random-access memory (ReRAM). The field of IMC has matured in recent years with impressive demonstrations of fully integrated chips performing deep learning inference where the key compute primitive being implemented is the matrix-vector multiply operation [6][7][8]. IMC with PCM has also been employed for in-memory logic operations [9][10][11] with applications in domains such as database query [12]. ...

Memristor-based hardware accelerators for artificial intelligence
  • Citing Article
  • April 2024

... With advancements in computers, algorithms, and software, the memristive neural network (MNN) model has gained widespread adoption across diverse domains [5][6][7]. The existence of memristors was confirmed by HP Labs in 2008, and since then, MNNs have been a focal point of research [8][9][10]. MNNs offer memory, adaptability, and high parallel processing capabilities. ...

Hardware implementation of memristor-based artificial neural networks

... This indicates that factors other than the simple proportion of GST and Sb 2 Te 3 might influence the optical properties of these superlattices. We further evaluated the growth of CSL on carbon-based films, which are becoming relevant for PCM devices 47 . To that end, CSL films 57 nm thick consisting of 12 repetitions of 3 nm Sb 2 Te 3 /1.8 ...

In-Memory Compute Chips with Carbon-based Projected Phase-Change Memory Devices
  • Citing Conference Paper
  • December 2023

... JOURNAL OF L AT EX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 2 Fig. 1. Model size of SOTA large language models (Sparrow [11], Chinchilla [12], HyperCLOVA [13], Galactica [14], GLM [15], LaMDA [16], FLAN [17], GPT-3.5 (ChatGPT) [18], GPT-4 [18], WebGPT [19], GPT-3 [18], OPT-IML [20], InstructGPT [21], OPT-175B [22], BlenderBot 3 [23], BLOOMZ [24], Jurassic [25], CPM-2 [26], Yuan [27], ERNIE [28], Gopher [29], MT-NLG [30], Med-PaLM [31], PaLM [32], Minerva [33], U-PaLM [34], Flan-PaLM [35], GShard [36], PanGu [37], MoE-Fairseq [38], GLaM [39]) efficient mapping of transformer models on FPGAs and ASICs and through optimization techniques such as parallelization, pipelining and avoiding redundant/ineffectual computations. Scope and outline of this paper: In this paper, we survey several optimization methods for efficient inference of transformer architectures and their family of architectures, such as summarize KD techniques for transformers. ...

Design of Analog-AI Hardware Accelerators for Transformer-based Language Models (Invited)
  • Citing Conference Paper
  • December 2023

... VN architectures which are based on the separation between processing and memory, suffer from substantial energy consumption and computational latency due to the data transfer overhead between the memory and the processor unit (Ma et al., 2020;Petrenko and Petrenko, 2018). On top of that, VN architectures are not the best candidates for IoT and edge-computing intelligent devices because they don't allow online and unsupervised learning Syed et al. (2024). These two characteristics though are important for systems that are intended to learn continuously and adapt themselves in real-time, like autonomous vehicles. ...

Non von Neumann computing concepts
  • Citing Chapter
  • January 2024