
Mohammed E. FoudaUniversity of California, Irvine | UCI · CECS
Mohammed E. Fouda
Doctor of Engineering
About
217
Publications
52,209
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,413
Citations
Introduction
Additional affiliations
March 2012 - March 2013
Publications
Publications (217)
Recent breakthroughs in neuromorphic computing show that local forms of gradient descent learning are compatible with Spiking Neural Networks (SNNs) and synaptic plasticity. Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn using gradient-descent in situ is still missing. In this paper, we propose a l...
Various hypotheses of information representation in brain, referred to as neural codes, have been proposed to explain the information transmission between neurons. Neural coding plays an essential role in enabling the brain-inspired spiking neural networks (SNNs) to perform different tasks. To search for the best coding scheme, we performed an exte...
Neuromorphic vision sensors have been extremely beneficial in developing energy-efficient intelligent systems for robotics and privacy-preserving security applications. There is a dire need for devices to mimic the retina’s photoreceptors that encode the light illumination into a sequence of spikes to develop such sensors. Herein, we develop a hybr...
Quantum computers have enabled solving problems beyond the current computers' capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficient...
To tackle real-world challenges, deep and complex neural networks are generally used with a massive number of parameters, which require large memory size, extensive computational operations, and high energy consumption in neuromorphic hardware systems. In this work, we propose an unsupervised online adaptive weight pruning method that dynamically r...
Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators like Softmax and Layernorm remain bottlenecks due to their sensitivity to quantization. We propose SoftmAP, a sof...
Mixed-precision quantization works Neural Networks (NNs) are gaining traction for their efficient realization on the hardware leading to higher throughput and lower energy. In-Memory Computing (IMC) accelerator architectures are offered as alternatives to traditional architectures relying on a data-centric computational paradigm, diminishing the me...
Designing generalized in-memory computing (IMC) hardware that efficiently supports a variety of workloads requires extensive design space exploration, which is infeasible to perform manually. Optimizing hardware individually for each workload or solely for the largest workload often fails to yield the most efficient generalized solutions. To addres...
Large language models have shown promise in various domains, including healthcare. In this study, we conduct a comprehensive evaluation of LLMs in the context of mental health tasks using social media data. We explore the zero-shot (ZS) and few-shot (FS) capabilities of various LLMs, including GPT-4, Llama 3, Gemini, and others, on tasks such as bi...
Object detection is crucial in various cutting-edge applications, such as autonomous vehicles and advanced robotics systems, primarily relying on data from conventional frame-based RGB sensors. However, these sensors often struggle with issues like motion blur and poor performance in challenging lighting conditions. In response to these challenges,...
The precise identification of electrical model parameters of Li-Ion batteries is essential for efficient usage and better prediction of the battery performance. In this work, the model identification performance of two metaheuristic optimization algorithms is compared. The algorithms in comparison are the Marine Predator Algorithm (MPA) and the Par...
Mixed-precision Deep Neural Networks (DNNs) provide an efficient solution for hardware deployment, especially under resource constraints, while maintaining model accuracy. Identifying the ideal bit precision for each layer, however, remains a challenge given the vast array of models, datasets, and quantization schemes, leading to an expansive searc...
DNA pattern matching is essential for many widely used bioinformatics applications. Disease diagnosis is one of these applications since analyzing changes in DNA sequences can increase our understanding of possible genetic diseases. The remarkable growth in the size of DNA datasets has resulted in challenges in discovering DNA patterns efficiently...
Recent works demonstrated that imperceptible perturbations to input data, known as adversarial examples, can mislead neural networks’ output. Moreover, the same adversarial sample can be transferable and used to fool different neural models. Such vulnerabilities impede the use of neural networks in mission-critical tasks. To the best of our knowled...
The high speed, scalability, and parallelism offered by ReRAM crossbar arrays foster the development of ReRAM-based next-generation AI accelerators. At the same time, the sensitivity of ReRAM to temperature variations decreases RON/ROFF ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-...
Content Addressable Memories (CAMs) are considered a key enabler for in-memory computing (IMC). IMC shows an order of magnitude improvement in energy efficiency and throughput compared to traditional computing techniques. Recently, analog CAMs (aCAMs) were proposed as a means to improve storage density and energy efficiency. In this work, we propos...
ReRAM crossbar arrays (RCAs) have the potential to provide extremely high efficiency for accelerating deep neural networks (DNNs). However, one crucial challenge for RCA-based DNN accelerators is functional inaccuracy due to nonidealities present in RCA hardware. While nonideality-aware training could be used to mitigate the effect of nonidealities...
Reverse engineering (RE) in Integrated Circuits (IC) is a process in which one will attempt to extract the internals of an IC, extract the circuit structure, and determine the gate-level information of an IC. In general, the RE process can be done for validation as well as Intellectual Property (IP) stealing intentions. In addition, RE also facilit...
This paper presents the algorithm and very-large-scale integration (VLSI) architecture of a high-throughput and highly efficient independent component analysis (ICA) processor for self-interference cancellation (SIC) in in-band full-duplex (IBFD) systems. This is the first VLSI architecture reported in the literature based on the state-of-the-art e...
We are delighted to invite you to participate in the 5th edition of the Novel Intelligent and Leading Emerging Sciences Conference (NILES2023), which will be held in Egypt from October 21-23, 2023. Registration and Call for Papers are now open.
ℹ Novel Intelligent and Leading Emerging Sciences (NILES) is an annual international conference that is h...
Decision trees are powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications with limited power and latency budget. In this paper, we propose a content-addressable memory compiler for decision tree inference acceleration. We propose a novel ”adaptive-precision” scheme that results in a com...
Deep neural networks have been proven to be highly effective tools in various domains, yet their computational and memory costs restrict them from being widely deployed on portable devices. The recent rapid increase of edge computing devices has led to an active search for techniques to address the above-mentioned limitations of machine learning fr...
In order to deal with increasingly complex computing problems, an In-memory-based computation system was proposed to replace the traditional Von-Neumann architectures. In-memory computing can save the time and energy of data movement between the memory and processor to avoid the memory-wall bottleneck of traditional Von-Neumann architecture. The as...
Deep neural networks (DNNs), as a subset of machine learning (ML) techniques, entail that real-world data can be learned, and decisions can be made in real time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challenges...
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to...
Static random‐access memory (SRAM) is a cornerstone in modern microprocessors architecture, as it has high power consumption, large area, and high complexity. Also, the stability of the data in the SRAM against the noise and the performance under the radian exposure are main concern issues. To overcome these limitations in the quest for higher info...
Lithium-ion batteries are crucial building stones in many applications. Therefore, modeling their behavior has become necessary in numerous fields, including heavyweight ones such as electric vehicles and plug-in hybrid electric vehicles, as well as lightweight ones like sensors and actuators. Generic models are in great demand for modeling the cur...
Quantum computers have enabled solving problems beyond the current machines’ capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficientl...
Supercapacitors are mostly recognized for their high power density capabilities and fast response time when compared to secondary batteries. However, computing their power in response to a given excitation using the standard formul{\ae} of capacitors is misleading and erroneous because supercapacitors are actually non-ideal capacitive devices that...
The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory costs and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacles by completely cutting off the backward path between modules and isolating their gradients. These methods prevent information...
The higher speed, scalability and parallelism offered by ReRAM crossbar arrays foster development of ReRAM-based next generation AI accelerators. At the same time, sensitivity of ReRAM to temperature variations decreases R_{ON}/R_{OFF} ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-a...
An independent component analysis (ICA) has been used in many applications, including self-interference cancellation (SIC) for in-band full-duplex (IBFD) wireless systems and anomaly detection in industrial Internet of Things (IoT). This article presents a high-throughput and highly efficient configurable preprocessing accelerator for the ICA algor...
Local learning schemes have shown promising performance in spiking neural networks (SNNs) training and are considered a step toward more biologically plausible learning. Despite many efforts to design high-performance neuromorphic systems, a fast and efficient on-chip training algorithm is still missing, which limits the deployment of neuromorphic...
Wireless body area network (WBAN) provides a means for seamless individual health monitoring without imposing restrictive limitations on normal daily routines. To date, Radio Frequency (RF) transceivers have been the technology of choice, however, drawbacks such as vulnerability to body shadowing effects, higher power consumption due to omnidirecti...
Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that preserves accuracy is not easily found, especially with the abundance of models, datasets, and quantization tec...
With the end of Moore's law, new paradigms are investigated for more scalable computing systems. One of the promising directions is to examine the data representation toward higher data density per hardware element. Multiple valued logic (MVL) emerged as a promising system due to its advantages over binary data representation. MVL offers higher inf...
In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data stored in the memory, which highly reduces the memory-processor communications during computation. Hence, signif...
DNA pattern matching is essential for many widely used bioinformatics applications. Disease diagnosis is one of these applications, since analyzing changes in DNA sequences can increase our understanding of possible genetic diseases. The remarkable growth in the size of DNA datasets has resulted in challenges in discovering DNA patterns efficiently...
Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory cost and prohibiting model parallelization. Existing local training methods aim to resolve the training obsta...
Multiply-Accumulate (MAC) is one of the most commonly used operations in modern computing systems due to its use in matrix multiplication, signal processing, and in new applications such as machine learning and deep neural networks. Ternary number system offers higher information processing within the same number of digits when compared to binary s...
Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adapt...
Content Addressable Memories (CAMs) are considered a key-enabler for in-memory computing (IMC). IMC shows order of magnitude improvement in energy efficiency and throughput compared to traditional computing techniques. Recently, analog CAMs (aCAMs) were proposed as a means to improve storage density and energy efficiency. In this work, we propose t...
In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data stored in the memory, which eliminates the need for memory-processor communications. Hence, orders of magnitude...
The automatic fitting of spiking neuron models to experimental data is a challenging problem. The integrate and fire model and Hodgkin–Huxley (HH) models represent the two complexity extremes of spiking neural models. Between these two extremes lies two and three differential-equation-based models. In this work, we investigate the problem of parame...
Fractional-order spiking neuron models can enrich model flexibility and dynamics due to the extra degrees of freedom. This paper aims to study the effects of applying four different numerical methods to two fractional-order spiking neuron models: the Fractional-order Leaky integrate-and-fire (FO-LIF) model and the Fractional-order Hodgkin–Huxley (F...
Independent component analysis (ICA) has been used in many applications, including self-interference cancellation in in-band full-duplex wireless communication systems. This paper presents a high-throughput and highly efficient configurable preprocessor for the ICA algorithm. The proposed ICA preprocessor has three major components for centering, f...
Recently, ReRAM-based hardware accelerators showed unprecedented performance compared the digital accelerators. Technology scaling causes an inevitable increase in interconnect wire resistance, which leads to IR drops that could limit the performance of ReRAM-based accelerators. These IR drops deteriorate the signal integrity and quality especially...
Although Resistive RAMs can support highly efficient matrix-vector multiplication, which is very useful for machine learning and other applications, the non-ideal behavior of hardware such as stuck-at fault and IR drop is an important concern in making ReRAM crossbar array-based deep learning accelerators. Previous work has addressed the nonidealit...
Directly training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. The well-known backpropagation through time (BPTT) algorithm proposed to train SNNs suffers from large memory footprint and prohibits backward and update unlocking, making it impossible to...
Reverse engineering (RE) in Integrated Circuits (IC) is a process in which one will attempt to extract the internals of an IC, extract the circuit structure, and determine the gate-level information of an IC. In general, RE process can be done for validation as well as intellectual property (IP) stealing intentions. In addition, RE also facilitates...
In-memory associative processor architectures are offered as a great candidate to overcome memory-wall bottleneck and to enable vector/parallel arithmetic operations. In this paper, we extend the functionality of the associative processor to multi-valued arithmetic. To allow for in-memory compute implementation of arithmetic or logic functions, we...
Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challe...