About
30
Publications
3,069
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
323
Citations
Current institution
Publications
Publications (30)
Low-bit-width data formats offer a promising solution for enhancing the energy efficiency of Deep Neural Network (DNN) training accelerators. In this work, we introduce a novel 5.3-bit data format that groups fixed-point values sharing a common exponent and scaling factor within a block of data. We propose a two-level logarithmic mantissa scaling m...
Kolmogorov-Arnold Networks (KAN) are an emerging AI model designed for AI+Science applications, offering up to 100x fewer parameters than conventional Multilayer Perceptrons (MLPs). KAN relies on computationally expensive non-linear functions, unlike MLPs, which are dominated by matrix multiplication. This limits KAN's compatibility with energy-eff...
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks. We pre...
There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer f...
Modern and future AI-based automotive applications, such as autonomous driving, require the efficient real-time processing of huge amounts of data from different sensors, like camera, radar, and LiDAR. In the ZuSE-KI-AVF project, multiple university, and industry partners collaborate to develop a novel massive parallel processor architecture, based...
The throughput and energy efficiency of compute-centric architectures for memory intensive Deep Neural Networks (DNN) applications are limited by memory bound issues like high data-access energy, long latencies, and limited bandwidth. Processing-in-Memory (PIM) is a very promising approach to address these challenges and bridge the memory-computati...
The large number of recent JEDEC DRAM standard releases and their increasing feature set makes it difficult for designers to rapidly upgrade the memory controller IPs to each new standard. Especially the hardware verification is challenging due to the higher protocol complexity of standards like DDR5, LPDDR5 or HBM3 in comparison with their predece...
Recently, we are witnessing a surge in DRAM-based Processing in Memory (PIM) publications from academia and industry. The architectures and design techniques proposed in these publications vary largely, ranging from integration of computation units in the DRAM IO region (i.e., without modifying DRAM core circuits) to modifying the highly optimized...
Processing-in-Memory (PIM) is an emerging approach to bridge the memory-computation gap. One of the major challenges of PIM architectures in the scope of Deep Neural Network (DNN) inference is the implementation of area-intensive Multiply-Accumulate (MAC) units in memory technologies, especially for DRAM-based PIMs. The DRAM architecture restricts...
Emerging memory-intensive applications require a paradigm shift from processor-centric to memory-centric computing. The performance of state-of-the-art computing systems and accelerators designed for such applications is not limited by the processing speed but rather by the limited DRAM bandwidth and long DRAM latencies. Although, the interface fre...
This paper presents an efficient crossbar design and implementation intended for analog compute-in-memory (ACiM) acceleration of artificial neural networks based on ferroelectric FET (FeFET) technology. The novel mixed signal blocks presented in this work reduce the device-to-device variation and are optimized for low area, low power and high throu...
Recurrent Neural Networks, in particular One-dimensional and Multidimensional Long Short-Term Memory (1D-LSTM and MD-LSTM) have achieved state-of-the-art classification accuracy in many applications such as machine translation, image caption generation, handwritten text recognition, medical imaging and many more. However, high classification accura...
The Artificial Neural Networks (ANNs), like CNN/DNN and LSTM, are not biologically plausible. Despite their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, e.g., cortex, basal ganglia, and amygdala, ha...
The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible and in spite of their initial success, they cannot attain the cognitive capabilities enabled by the dynamic hierarchical associative memory systems of biological brains. The biologically plausible spiking brain models, for e.g. cortex, basal ganglia and amygd...
In recent years, an increasing number of different JEDEC memory standards, like DDR4/5, LPDDR4/5, GDDR6, Wide I/O2, HBM2, and NVDIMM-P have been specified, which differ significantly from the previous ones like DDR3 and LPDDR3. Since each new standard comes with significant changes in the DRAM protocol compared to the previous ones, the developers...
Energy consumption is one of the major challenges for the advanced System on Chips (SoC). This is addressed by adopting heterogeneous and approximate computing techniques. One of the recent evolution in this context is transprecision computing paradigm. The idea of the transprecision computing is to consume adequate amount of energy for each operat...
Autonomous driving is disrupting conventional automotive development. Underlying reasons include control unit consolidation, the use of components originally developed for the consumer market, and the large amount of data that must be processed. For instance, Audi's zFAS or NVIDIA's Xavier platform integrate GPUs, custom accelerators, and CPUs with...
DRAMs face several major challenges: On the one hand, DRAM bit cells are leaky and must be refreshed periodically to ensure data integrity. Therefore, DRAM devices suffer from a large overhead due to refreshes both in terms of performance (available bandwidth) and power. On the other hand, reliability issues caused by technology shrinking are becom...