About
35
Publications
5,877
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
828
Citations
Publications
Publications (35)
We study the problem of experiment design to learn causal structures from interventional data. We consider an active learning setting in which the experimenter decides to intervene on one of the variables in the system in each step and uses the results of the intervention to recover further causal relationships among the variables. The goal is to f...
Given a social network modeled as a weighted graph
$G$
, the influence maximization problem seeks
$k$
vertices to become initially influenced, to maximize the expected number of influenced nodes under a particular diffusion model. The influence maximization problem has been proven to be NP-hard, and most proposed solutions to the problem are ap...
We consider the problem of recovering channel code parameters over a candidate set by merely analyzing the received encoded signals. We propose a deep learning-based solution that I) is capable of identifying the channel code parameters for several coding scheme (such as LDPC, Convolutional, Turbo, and Polar codes), II) is robust against channel im...
Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced Tensor cores in modern GPU architectures provide incredible computing capability. This paper proposes a novel para...
This paper describes a parallel implementation of Viterbi decoding algorithm. Viterbi decoder is widely used in many state-of-the-art wireless systems. The proposed solution optimizes both throughput and memory usage by applying optimizations such as unified kernel implementation and parallel traceback. Experimental evaluations show that the propos...
We consider the problem of recovering channel code parameters over a candidate set by merely analyzing the received encoded signals. We propose a deep learning-based solution that I) is capable of identifying the channel code parameters for any coding scheme (such as LDPC, Convolutional, Turbo, and Polar codes), II) is robust against channel impair...
Given a social network modeled as a weighted graph $G$, the influence maximization problem seeks $k$ vertices to become initially influenced, to maximize the expected number of influenced nodes under a particular diffusion model. The influence maximization problem has been proven to be NP-hard, and most proposed solutions to the problem are approxi...
We study the problem of experiment design to learn causal structures from interventional data. We consider an active learning setting in which the experimenter decides to intervene on one of the variables in the system in each step and uses the results of the intervention to recover further causal relationships among the variables. The goal is to f...
This paper presents a novel ECG classification algorithm for inclusion as part of real-time cardiac monitoring systems in ultra low-power wearable devices. The proposed solution is based on spiking neural networks which are the third generation of neural networks. In specific, we employ spike-timing dependent plasticity (STDP), and reward-modulated...
The main goal in many fields in the empirical sciences is to discover causal relationships among a set of variables from observational data. PC algorithm is one of the promising solutions to learn underlying causal structure by performing a number of conditional independence tests. In this paper, we propose a novel GPU-based parallel algorithm, cal...
The graph matching problem refers to recovering the node-to-node correspondence between two correlated graphs. A previous work theoretically showed that recovering is feasible in sparse Erdos-Renyi graphs if and only if the probability of having an edge between a pair of nodes in one of the graphs and also between the corresponding nodes in the oth...
This paper presents a novel ECG classification algorithm for real-time cardiac monitoring on ultra low-power wearable devices. The proposed solution is based on spiking neural networks which are the third generation of neural networks. In specific, we employ spike-timing dependent plasticity (STDP), and reward-modulated STDP (R-STDP), in which the...
Objective:
A novel electrocardiogram (ECG) classification algorithm is proposed for continuous cardiac monitoring on wearable devices with limited processing capacity.
Methods:
The proposed solution employs a novel architecture consisting of wavelet transform and multiple long short-term memory (LSTM) recurrent neural networks (see Fig. 1).
Res...
The main goal in many fields in empirical sciences is to discover causal relationships among a set of variables from observational data. PC algorithm is one of the promising solutions to learn the underlying causal structure by performing a number of conditional independence tests. In this paper, we propose a novel GPU-based parallel algorithm, cal...
A novel ECG classification algorithm is proposed for continuous cardiac monitoring on wearable devices with limited processing capacity. The proposed solution employs a novel architecture consisting of wavelet transform and multiple LSTM recurrent neural networks (Fig. 1). Experimental evaluations show superior ECG classification performance compar...
This paper proposes a novel FPGA-based matrix-inversion technique that is specifically tailored and optimized for real-time electromagnetic transients simulation of power electronic converters with high switching frequency. This is the first reported solution that is capable of solving the real-time equations related to using ideal switch model and...
Synchronous dataflow (SDF) graphs are often the computational model of choice for specification, analysis, and automated synthesis of parallel streaming kernels targeting embedded multiprocessor system-on-a-chip (MPSoC) platforms. We discuss several limitations of the SDF graphs in the context of conventional parallel software synthesis methodologi...
Many mobile applications running on smartphones and wearable devices would potentially benefit from the accuracy and scalability of deep CNN-based machine learning algorithms. However, performance and energy consumption limitations make the execution of such computationally intensive algorithms on mobile devices prohibitive. We present a GPU-accele...
Models of computation abstract away a number of implementation details in favor of well-defined semantics. While this has unquestionable benefits, we argue that analysis of models solely based on operational semantics (implementation-oblivious analysis) is unfit to drive implementation design space exploration. Specifically, we study the tradeoff b...
Models of computation abstract away a number of implementation details in favor of well-defined semantics. While this has unquestionable benefits, we argue that analysis of models solely based on operational semantics (implementationoblivious analysis) is unfit to drive implementation design space exploration. Specifically, we study the tradeoff be...
We study the problem of mapping concurrent tasks of an application to cores of a chip multiprocessor that utilize circuit-switched interconnect and global asynchronous local synchronous (GALS) clocking domains. We develop a configurable algorithm that naturally handles a number of practical requirements, such as architectural features of the target...
We study the trade-off between throughput and memory footprint of embedded software that is synthesized from acyclic static dataflow (task graph) specifications targeting distributed memory multiprocessors. We identify iteration overlapping as a knob in the synthesis process by which one can trade application throughput for its memory requirement....
Streaming applications, which are abundant in many disciplines such as multimedia, networking, and signal processing, require efficient processing of a seemingly infinite sequence of input data. In the context of streaming software synthesis from data flow graphs, we study the inherent trade-off between memory requirement and compilation runtime, u...
Variants of dataflow specification models are widely used to synthesize streaming applications for distributed-memory parallel processors. We argue that current practice of specifying streaming applications using rigid dataflow models, implicitly prohibits a number of platform oriented optimizations and hence limits portability and scalability with...
Variants of dataflow specification models are widely used to synthesize streaming applications for distributed-memory parallel processors. We argue that current practice of specifying streaming applications using rigid dataflow models, implicitly prohibits a number of platform oriented optimizations and hence limits portability and scalability with...
Variants of dataflow specification models are widely used to synthesize streaming applications for distributed-memory parallel processors. We argue that current practice of specifying streaming applications using rigid dataflow models, implicitly prohibits a number of platform oriented optimizations and hence, has limited portability and scalabilit...
We present a computer engineering capstone design project course focused on accelerating intensive computations via integration of application-specific co-processors with digital processor systems. We propose utilization of puzzle solvers as attractive, scalable and simple-to-understand applications to engage students with practicing a number of fu...
Many embedded applications demand processing of a seemingly endless stream of input data in real-time. Productive development of such applications is typically carried out by synthesizing software from high-level specifications, such as data-flow graphs. In this context, we study the problem of inter-actor buffer allocation, which is a critical ste...
Many embedded applications demand processing of a seemingly endless stream of input data in real-time. Productive development of such applications is typically carried out by synthesizing software from high-level specifications, such as data-flow graphs. In this context, we study the problem of inter-actor buffer allocation, which is a critical ste...
Heterogeneous soft multiprocessor systems are likely to find a larger share in the application-specific computing market due to increasing cost and defect rates in foreseeable manufacturing technologies. We study the problem of mapping streaming applications onto heterogeneous soft dual-processor systems, in which processors' limited memory resourc...
We present a methodology for pipelined software synthesis of streaming applications. First, we develop a versatile task assignment algorithm capable of optimizing realistically-arbitrary cost functions for two cores. The algorithm is exact (i.e., theoretically optimal) contrary to existing heuristics. Second, our approximation technique provides an...
We present a framework for development of streaming applications as concurrent software modules running on multi-processors system-on-chips (MPSoC). We propose an iterative design space exploration mechanism to customize MPSoC architecture for given applications. Central to the exploration engine is our system-level performance estimation methodolo...
Pipelined execution of streaming applications enable processing of high-throughput data under performance constraint. We present an integrated approach to synthesizing pipelined software for dual-core architectures. We target streaming applications modeled as task graphs that are amenable to static analysis. We develop a versatile task assignment a...
We present a methodology for synthesizing streaming applications, modeled as task graphs, for pipelined execution on multi-core architectures. We develop a task graph extraction and characterization framework that accurately determines the structure, computation and communication characteristics of application task graph from its specification in C...