Masoud Daneshtalab

Masoud Daneshtalab
  • Professor
  • Professor (Full) at Mälardalen University

About

303
Publications
62,427
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,612
Citations
Introduction
Multi-objective optimization, Deep Learning and Neural Architectural Search, Heterogeneous computing, HW/SW codesign Time-sensitive network and Interconnection network,
Current institution
Mälardalen University
Current position
  • Professor (Full)
Additional affiliations
September 2016 - October 2019
Mälardalen University
Position
  • Professor (Associate)
September 2004 - August 2008
University of Tehran
Position
  • Researcher
January 2014 - December 2016
KTH Royal Institute of Technology
Position
  • EU Marie Curie Fellow
Education
September 2008 - September 2011
University of Turku
Field of study
  • Information and Communications Technology
September 2004 - September 2006
University of Tehran
Field of study
  • Computer Architecture
September 1998 - September 2002
Shahid Bahonar University of Kerman
Field of study
  • Computer Hardware Engineering

Publications

Publications (303)
Conference Paper
Full-text available
A well suited monitoring and management system is becoming a necessity as the number of cores on single chip systems is increasing. Some works have proposed monitoring systems in order to enable off-chip system debugging, while some others have introduced a monitoring approach towards system self-awareness. The latter tries to facilitate self-manag...
Conference Paper
Full-text available
Streaming applications are a keystone in several emerging multimedia services like DVB-IPTV, VoD and on-line gaming. Due to the high computing requirements and real-time constraints inherent to this kind of applications multi-processor system-on-chip (MPSoCs) have been proposed as a solution. In addition, the FPGA technology has become popular amon...
Conference Paper
Full-text available
Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Compile-time mapping decisions are neither optimal nor desirable to efficiently support the diverse and unpredictable application requirements. As a solution to this problem, recently proposed architectures o...
Article
Full-text available
Power budgeting for NoC needs to be performed to meet limited power budget while assuring the best possible overall system performance. For simplicity and ease of implementation, existing NoC power budgeting schemes, irrespective of the fact that the packet arrival rates of different NoC routers may vary significantly, treat all the individual rout...
Conference Paper
Full-text available
As low-power electronics and miniaturization conspire to populate the world with emerging devices, one appealing approach is to power these multi-core/many-core-based devices with energy harvested from various environments. Of the most important issues concerning these devices is how to effectively allocate power budget among the cores competing fo...
Article
Full-text available
As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing potential collisions. This study introduces the...
Preprint
Full-text available
Growing exploitation of Machine Learning (ML) in safety-critical applications necessitates rigorous safety analysis. Hardware reliability assessment is a major concern with respect to measuring the level of safety. Quantifying the reliability of emerging ML models, including Deep Neural Networks (DNNs), is highly complex due to their enormous size...
Article
Full-text available
Autonomous driving systems are a rapidly evolving technology. Trajectory prediction is a critical component of autonomous driving systems that enables safe navigation by anticipating the movement of surrounding objects. Lidar point-cloud data provide a 3D view of solid objects surrounding the ego-vehicle. Hence, trajectory prediction using Lidar po...
Preprint
Full-text available
Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. S...
Preprint
Full-text available
Convolutional Neural Networks (CNNs) have become integral in safety-critical applications, thus raising concerns about their fault tolerance. Conventional hardware-dependent fault tolerance methods, such as Triple Modular Redundancy (TMR), are computationally expensive, imposing a remarkable overhead on CNNs. Whereas fault tolerance techniques can...
Article
Full-text available
Deep Neural Network (DNN) hardware accelerators are essential in a spectrum of safety-critical edge-AI applications with stringent reliability, energy efficiency, and latency requirements. Multiplication is the most resource-hungry operation in the neural network’s processing elements. This paper proposes a scalable adaptive fault-tolerant approxim...
Article
Full-text available
Artificial Intelligence (AI) and, in particular, Machine Learning (ML) have emerged to be utilized in various applications due to their capability to learn how to solve complex problems. Over the last decade, rapid advances in ML have presented Deep Neural Networks (DNNs) consisting of a large number of neurons and layers. DNN Hardware Accelerators...
Preprint
Full-text available
Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to accelerate SpMV kernels. Prior studies mainly focused on reducing the latency consumption of SpMV kernels on...
Article
The deployment of Deep Neural Networks (DNNs) on edge devices is hindered by the substantial gap between performance requirements and available computational power. While recent research has made significant strides in developing pruning methods to build a sparse network for reducing the computing overhead of DNNs, there remains considerable accura...
Chapter
In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. We present a method for analyzing datasets from a use-case scenario perspective, detecting and quantifying out-of-distribution (OOD) data on dataset...
Chapter
Full-text available
Real-world applications that are safety-critical and resource-constrained necessitate using compact and robust Deep Neural Networks (DNNs) against adversarial data perturbation. MobileNet-tiny has been introduced as a compact DNN to deploy on edge devices to reduce the size of networks. To make DNNs more robust against adversarial data, adversarial...
Chapter
Deep Neural Networks (DNNs) have been deployed in safety-critical real-world applications, including automated decision-making systems. There are often concerns about two aspects of these systems: the fairness of the predictions and their robustness against adversarial attacks. In recent years, extensive studies have been devoted to addressing thes...
Preprint
Full-text available
Detecting road lanes is challenging due to intricate markings vulnerable to unfavorable conditions. Lane markings have strong shape priors, but their visibility is easily compromised. Factors like lighting, weather, vehicles, pedestrians, and aging colors challenge the detection. A large amount of data is required to train a lane detection approach...
Article
Full-text available
This paper presents a novel machine learning framework for detecting PxAF, a pathological characteristic of electrocardiogram (ECG) that can lead to fatal conditions such as heart attack. To enhance the learning process, the framework involves a generative adversarial network (GAN) along with a neural architecture search (NAS) in the data preparati...
Preprint
Full-text available
The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliabi...
Preprint
Full-text available
Nowadays, the extensive exploitation of Deep Neural Networks (DNNs) in safety-critical applications raises new reliability concerns. In practice, methods for fault injection by emulation in hardware are efficient and widely used to study the resilience of DNN architectures for mitigating reliability issues already at the early design stages. Howeve...
Preprint
Full-text available
Deep Learning, and in particular, Deep Neural Network (DNN) is nowadays widely used in many scenarios, including safety-critical applications such as autonomous driving. In this context, besides energy efficiency and performance, reliability plays a crucial role since a system failure can jeopardize human life. As with any other device, the reliabi...
Preprint
Full-text available
Artificial Intelligence (AI) and, in particular, Machine Learning (ML) have emerged to be utilized in various applications due to their capability to learn how to solve complex problems. Over the last decade, rapid advances in ML have presented Deep Neural Networks (DNNs) consisting of a large number of neurons and layers. DNN Hardware Accelerators...
Conference Paper
Deep Learning, and in particular, Deep Neural Network (DNN) is nowadays widely used in many scenarios, including safety-critical applications such as autonomous driving. In this context, besides energy efficiency and performance, reliability plays a crucial role since a system failure can jeopardize human life. As with any other device, the reliabi...
Conference Paper
Full-text available
While the role of Deep Neural Networks (DNNs) in a wide range of safety-critical applications is expanding, emerging DNNs experience massive growth in terms of computation power. It raises the necessity of improving the reliability of DNN accelerators yet reducing the computational burden on the hardware platforms, i.e. reducing the energy consumpt...
Preprint
Full-text available
Deep Neural Networks (DNNs) and their accelerators are being deployed ever more frequently in safety-critical applications leading to increasing reliability concerns. A traditional and accurate method for assessing DNNs' reliability has been resorting to fault injection, which, however, suffers from prohibitive time complexity. While analytical and...
Preprint
Full-text available
Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to accelerate SpMV kernels. Prior studies mainly focused on reducing the latency of SpMV kernels on GPU. However...
Preprint
Full-text available
This paper presents a novel machine learning framework for detecting Paroxysmal Atrial Fib-rillation (PxAF), a pathological characteristic of Electrocardiogram (ECG) that can lead to fatalconditions such as heart attack. To enhance the learning process, the framework involves a Gen-erative Adversarial Network (GAN) along with a Neural Architecture...
Preprint
Full-text available
GTFLAT, as a game theory-based add-on, addresses an important research question: How can a federated learning algorithm achieve better performance and training efficiency by setting more effective adaptive weights for averaging in the model aggregation phase? The main objectives for the ideal method of answering the question are: (1) empowering fed...
Article
Full-text available
Recent advances in very-large-scale integration (VLSI) technologies have offered the capability of integrating thousands of processing elements onto a single silicon microchip. Multiprocessor systems-on-chips (MPSoCs) are the latest creation of this technology evolution. Network-on-Chip (NoC) is a scalable and promising interconnection solution use...
Chapter
Full-text available
Lane detection is one of the most fundamental tasks for autonomous driving. It plays a crucial role in the lateral control and the precise localization of autonomous vehicles. Monocular 3D lane detection methods provide state-of-the-art results for estimating the position of lanes in 3D world coordinates using only the information obtained from the...
Preprint
Full-text available
The deployment of Convolutional Neural Networks (CNNs) on edge devices is hindered by the substantial gap between performance requirements and available processing power. While recent research has made large strides in developing network pruning methods for reducing the computing overhead of CNNs, there remains considerable accuracy loss, especiall...
Conference Paper
Full-text available
Integrating wired Ethernet networks, such as Time-Sensitive Networks (TSN), to 5G cellular network requires a flow management technique to efficiently map TSN traffic to 5G Quality-of-Service (QoS) flows. The 3GPP Release 16 provides a set of predefined QoS characteristics, such as priority level, packet delay budget, and maximum data burst volume,...
Conference Paper
Full-text available
The Time-Sensitive Network (TSN) amendments and protocols add capabilities on top of standard 802.1 Ethernet for guaranteeing the timeliness of both (isochronous) scheduled traffic (ST) and shaped (audio-video) communication (AVB) in distributed applications. ST streams are guaranteed via an offline computed schedule controlling the time-aware gate...
Conference Paper
Full-text available
Time Sensitive Networking (TSN) is a set of IEEE standards based on switched Ethernet that aim at meeting high-bandwidth and low-latency requirements in wired communication. TSN implementations typically do not support integration of wireless networks, which limits their applicability to many industrial applications that need both wired and wireles...
Conference Paper
Full-text available
Ternary Neural Networks (TNNs) compress network weights and activation functions into 2-bit representation resulting in remarkable network compression and energy efficiency. However, there remains a significant gap in accuracy between TNNs and full-precision counterparts. Recent advances in Neural Architectures Search (NAS) promise opportunities in...
Article
Full-text available
Convolutional neural networks (CNNs) provide the best accuracy for disparity estimation. However, CNNs are computationally expensive, making them unfavorable for resource-limited devices with real-time constraints. Recent advances in neural architectures search (NAS) promise opportunities in automated optimization for disparity estimation. However,...
Article
Full-text available
Solving Integer Linear Programming (ILP) models generally lies in the category of NP-hard problems and finding the optimal answer for large models is a computational challenge. Genetic algorithms are a family of metaheuristic algorithms capable of adjusting and redesigning parameters and operations according to the characteristics of ILP models. On...
Article
Full-text available
The functionality advancements and novel customer features that are currently found in modern automotive systems require high-bandwidth and low-latency in-vehicle communications, which become even more compelling for autonomous vehicles. In a recent effort to meet these requirements, the IEEE Time-Sensitive Networking (TSN) task group has developed...
Article
Long Short-Term Memory (LSTM) achieved great success in healthcare applications. However, its extensive computation cost and massive model size have become the major obstacles for the deployment of such a powerful algorithm in resource-limited embedded systems such as wearable devices. Quantization is a promising way to reduce the memory footprint...
Conference Paper
Long Short-Term Memory (LSTM) is one of the most popular and effective Recurrent Neural Network (RNN) models used for sequence learning in applications such as ECG signal classification. Complex LSTMs could hardly be deployed on resource-limited bio-medical wearable devices due to the huge amount of computations and memory requirements. Binary LSTM...
Article
Full-text available
Interest is growing in the use of autonomous swarms of drones in various mission-physical applications such as surveillance, intelligent monitoring, and rescue operations. Swarm systems should fulfill safety and efficiency constraints in order to guarantee dependable operations. To maximize motion safety, we should design the swarm system in such a...
Article
This paper presents a comprehensive software-based technique that is capable of detecting soft errors in embedded systems. Soft errors can be categorized into Control Flow Errors (CFEs) and data errors. The CFEs change the flow of the program erroneously and data errors also change the results. In this paper, a new comprehensive method is presented...
Article
Full-text available
Deep Learning (DL) has recently become a topic of study in different applications including healthcare, in which timely detection of anomalies on Electrocardiogram (ECG) can play a vital role in patient monitoring. This paper presents a comprehensive review study on the recent DL methods applied to the ECG signal for the classification purposes. Th...
Book
This book presents and discusses innovative ideas in the design, modelling, implementation, and optimization of hardware platforms for neural networks. The rapid growth of server, desktop, and embedded applications based on deep learning has brought about a renaissance in interest in neural networks, with applications including image and speech pr...
Article
Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the memory hierarchy and processor registers. Some prior mechanisms propose to avoid this unnecessary data movement by...
Preprint
Full-text available
Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM chip and the processor chip. Some prior mechanisms propose to avoid this unnecessary data movement by using t...
Preprint
Full-text available
Recurrent Neural Networks (RNN) are widely used for learning sequences in applications such as EEG classification. Complex RNNs could be hardly deployed on wearable devices due to their computation and memory-intensive processing patterns. Generally, reduction in precision leads much more efficiency and binarized RNNs are introduced as energy-effic...
Preprint
Full-text available
Long Short-Term Memory (LSTM) is widely used in various sequential applications. Complex LSTMs could be hardly deployed on wearable and resourced-limited devices due to the huge amount of computations and memory requirements. Binary LSTMs are introduced to cope with this problem, however, they lead to significant accuracy loss in some application s...
Article
Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose De...
Article
The enormous and ever-increasing complexity of state-of-the-art neural networks has impeded the deployment of deep learning on resource-limited embedded and mobile devices. To reduce the complexity of neural networks, this paper presents ΔNN, a power-efficient architecture that leverages a combination of the approximate value locality of neuron wei...
Conference Paper
Full-text available
Fog computing offers a wide range of service levels including low bandwidth usage, low response time, support of heterogeneous applications, and high energy efficiency. Therefore, real-time embedded applications could potentially benefit from Fog infrastructure. However, providing high system utilization is an important challenge of Fog computing esp...

Network

Cited By