Chirag Sudarshan

Chirag Sudarshan
  • Master of Science
  • Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau

About

30
Publications
3,069
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
323
Citations
Current institution
Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau

Publications

Publications (30)
Article
Low-bit-width data formats offer a promising solution for enhancing the energy efficiency of Deep Neural Network (DNN) training accelerators. In this work, we introduce a novel 5.3-bit data format that groups fixed-point values sharing a common exponent and scaling factor within a block of data. We propose a two-level logarithmic mantissa scaling m...
Preprint
Full-text available
Kolmogorov-Arnold Networks (KAN) are an emerging AI model designed for AI+Science applications, offering up to 100x fewer parameters than conventional Multilayer Perceptrons (MLPs). KAN relies on computationally expensive non-linear functions, unlike MLPs, which are dominated by matrix multiplication. This limits KAN's compatibility with energy-eff...
Preprint
Full-text available
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks. We pre...
Article
Full-text available
There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer f...
Conference Paper
Modern and future AI-based automotive applications, such as autonomous driving, require the efficient real-time processing of huge amounts of data from different sensors, like camera, radar, and LiDAR. In the ZuSE-KI-AVF project, multiple university, and industry partners collaborate to develop a novel massive parallel processor architecture, based...
Conference Paper
The throughput and energy efficiency of compute-centric architectures for memory intensive Deep Neural Networks (DNN) applications are limited by memory bound issues like high data-access energy, long latencies, and limited bandwidth. Processing-in-Memory (PIM) is a very promising approach to address these challenges and bridge the memory-computati...
Preprint
Full-text available
The large number of recent JEDEC DRAM standard releases and their increasing feature set makes it difficult for designers to rapidly upgrade the memory controller IPs to each new standard. Especially the hardware verification is challenging due to the higher protocol complexity of standards like DDR5, LPDDR5 or HBM3 in comparison with their predece...
Chapter
Recently, we are witnessing a surge in DRAM-based Processing in Memory (PIM) publications from academia and industry. The architectures and design techniques proposed in these publications vary largely, ranging from integration of computation units in the DRAM IO region (i.e., without modifying DRAM core circuits) to modifying the highly optimized...
Article
Processing-in-Memory (PIM) is an emerging approach to bridge the memory-computation gap. One of the major challenges of PIM architectures in the scope of Deep Neural Network (DNN) inference is the implementation of area-intensive Multiply-Accumulate (MAC) units in memory technologies, especially for DRAM-based PIMs. The DRAM architecture restricts...
Article
Emerging memory-intensive applications require a paradigm shift from processor-centric to memory-centric computing. The performance of state-of-the-art computing systems and accelerators designed for such applications is not limited by the processing speed but rather by the limited DRAM bandwidth and long DRAM latencies. Although, the interface fre...
Conference Paper
Full-text available
This paper presents an efficient crossbar design and implementation intended for analog compute-in-memory (ACiM) acceleration of artificial neural networks based on ferroelectric FET (FeFET) technology. The novel mixed signal blocks presented in this work reduce the device-to-device variation and are optimized for low area, low power and high throu...
Article
Full-text available
Recurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption generation, handwritten text recognition, medical imaging and many more. However, high classification accura...
Article
Full-text available
The Artificial Neural Networks (ANNs), like CNN/DNN and LSTM, are not biologically plausible. Despite their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, e.g., cortex, basal ganglia, and amygdala, ha...
Preprint
Full-text available
The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible and in spite of their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, for e.g. cortex, basal ganglia and amygd...
Conference Paper
In recent years, an increasing number of different JEDEC memory standards, like DDR4/5, LPDDR4/5, GDDR6, Wide I/O2, HBM2, and NVDIMM-P have been specified, which differ significantly from the previous ones like DDR3 and LPDDR3. Since each new standard comes with significant changes in the DRAM protocol compared to the previous ones, the developers...
Chapter
Energy consumption is one of the major challenges for the advanced System on Chips (SoC). This is addressed by adopting heterogeneous and approximate computing techniques. One of the recent evolution in this context is transprecision computing paradigm. The idea of the transprecision computing is to consume adequate amount of energy for each operat...
Conference Paper
Autonomous driving is disrupting conventional automotive development. Underlying reasons include control unit consolidation, the use of components originally developed for the consumer market, and the large amount of data that must be processed. For instance, Audi's zFAS or NVIDIA's Xavier platform integrate GPUs, custom accelerators, and CPUs with...
Conference Paper
DRAMs face several major challenges: On the one hand, DRAM bit cells are leaky and must be refreshed periodically to ensure data integrity. Therefore, DRAM devices suffer from a large overhead due to refreshes both in terms of performance (available bandwidth) and power. On the other hand, reliability issues caused by technology shrinking are becom...

Network

Cited By