Luca Benini

Luca Benini
University of Bologna | UNIBO · "Guglielmo Marconi" Department of Electrical, Electronic and Information Engineering DEI

PhD Stanford University

About

1,161
Publications
243,449
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
41,684
Citations
Introduction
My research interests are in energy-efficient system design and Multi-Core SoC design. I'a also active in the area of energy-efficient smart sensors and sensor networks for biomedical and ambient intelligence applications. I published more than 600 papers in peer-reviewed international journals and conferences, four books and several book chapters. I am a Fellow of the IEEE, and a member of the Academia Europaea.
Skills and Expertise

Publications

Publications (1,161)
Conference Paper
Full-text available
The novel COVID-19 disease has been declared a pandemic event. Early detection of infection symptoms and contact tracing are playing a vital role in containing COVID-19 spread. As demonstrated by recent literature, multi-sensor and connected wearable devices might enable symptom detection and help tracing contacts, while also acquiring useful epide...
Article
Full-text available
Self-sustainability and intelligence will be critical features of the next generation of Internet of Things (IoT) devices. Power management circuits need to be energy-efficient in multiple power modes, from nW sleep to mW active, and to support energy harvesting from multiple environmental sources. This work presents an adaptive and firmware-config...
Conference Paper
As the Internet-of-Things (IoT) applications become more and more pervasive, IoT end nodes are requiring more and more computational power within a few mW of power envelope, coupled with high-speed and energy-efficient inter-chip communication to deal with the growing input/output and memory bandwidth for emerging near-sensor analytics applications...
Conference Paper
Full-text available
High precision Global Navigation Satellite System (GNSS) is a crucial feature for geo-localization to enhance future applications such as self-driving vehicles. Real-Time Kinematic (RTK) is a promising technology to achieve centimeter precision in GNSS. However, it requires radio communication, which usually is power-hungry and costly, e.g. when us...
Article
Full-text available
Nano-sized unmanned aerial vehicles (UAVs), e.g. quadcopters, have received significant attention in recent years. Although their capabilities have grown, they continue to have very limited flight times, tens of minutes at most. The main constraints are the battery’s energy density and the engine power required for flight. In this work, we present...
Article
Full-text available
In this paper, we present StreamDrive, a dynamic dataflow framework for programming clustered embedded multicore architectures. StreamDrive simplifies development of dynamic dataflow applications starting from sequential reference C code and allows seamless handling of heterogeneous and application-specific processing elements by applications. We a...
Conference Paper
Developing embedded systems tailored for resource-constrained platforms enables the design of robust frameworks for controlling artificial arms in prosthetic applications. This work presents preliminary results of the implementation of a novel platform for EMG-based gesture recognition application based on Hyper dimensional Computing (HDC), a novel...
Article
Energy efficiency is of paramount importance for the sustainability of high performance computing (HPC) systems. Energy consumption limits the peak performance of supercomputers and accounts for a large share of Total Cost of Ownership (TCO). Consequently, system owners and final users have started exploring mechanisms to trade off performance for...
Article
Full-text available
High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories...
Article
Full-text available
Optically trapped nanoparticles are used in various fields ranging from biophysics to precision sensing. An optically trapped nanoparticle can be regarded as a harmonic oscillator driven by the thermal fluctuations of its environment. Unlocking the potential of optically levitated systems for precision measurements in the classical and the quantum...
Preprint
Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference late...
Preprint
Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among...
Conference Paper
As of today, large-scale wireless sensor networks are adopted for smart building applications as they are easy and flexible to deploy. Low-power wireless nodes can achieve multi-year lifetimes with an AA battery using Bluetooth Low Energy (BLE) and Zig-Bee. However, replacing these batteries at scale is a non-trivial, labor-intensive task. Energy h...
Conference Paper
Smart building applications require a large-scale deployment of sensors distributed across the environment. Recent innovations in smart environments are driven by wireless networked sensors as they are easy to deploy. However, replacing these batteries at scale is a non-trivial, labor-intensive task. Energy harvesting has emerged as a potential sol...
Article
Full-text available
Energy efficiency is crucial in the design of battery-powered end devices, such as smart sensors for the Internet of Things applications. Wireless communication between these distributed smart devices consumes significant energy, and even more when data need to reach several kilometers in distance. Low-power and long-range communication technologie...
Article
A key enabler for the ever-increasing adoption of FPGA accelerators is the availability of frameworks allowing for the seamless coupling to general-purpose host processors. Embedded FPGA+CPU systems still heavily rely on copy-based host-to-accelerator communication, which complicates application development. In this paper, we present a hardware/sof...
Article
Recognizing the very size of the brain’s circuits, hyperdimensional (HD) computing can model neural activity patterns with points in a HD space, that is, with HD vectors. Key examined properties of HD computing include: a versatile set of arithmetic operations on HD vectors, generality, scalability, analyzability, one-shot learning, and energy effi...
Conference Paper
Full-text available
Smart building applications require a large-scale deployment of sensors distributed across the environment. Recent innovations in smart environments are driven by wireless networked sensors as they are easy to deploy. However, replacing these batteries at scale is a non-trivial, labor-intensive task. Energy harvesting has emerged as a potential sol...
Preprint
Power and thermal management are critical components of high performance computing (HPC) systems, due to their high power density and large total power consumption. The assessment of thermal dissipation by means of compact models directly from the thermal response of the final device enables more robust and precise thermal control strategies as wel...
Preprint
Full-text available
After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has trigger...
Conference Paper
Full-text available
This paper focuses on ultra-low power embedded classification of neural activities. The machine learning (ML) algorithm has been trained using evoked local field potentials (LFPs) recorded with an implanted 16×16 multi-electrode array (MEA) from the rat barrel cortex while stimulating the whisker. Experimental results demonstrate that ML can be suc...
Conference Paper
Full-text available
One of the most ambitious goals of neuroscienceand its neuroprosthetic applications is to interface intelligentelectronic devices with the biological brain to cure neurologicaldiseases. This emerging research field builds on our growingunderstanding of brain circuits and on recent technological ad-vances in miniaturization of implantable multi-elec...
Preprint
Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA bu...
Article
A growing trend in Human Computer Interaction (HCI) is to integrate computational capabilities into wearable devices, to enable sophisticated and natural interaction modalities. Acting directly by decoding neural activity is a very natural way of interaction and one of the fundamental paradigms of Brain Computer Interfaces (BCIs) as well. In this w...
Preprint
Full-text available
The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data...
Preprint
Full-text available
Classical molecular dynamics (MD) simulations are important tools in life and material sciences since they allow studying chemical and biological processes in detail. However, the inherent scalability problem of particle-particle interactions and the sequential dependency of subsequent time steps render MD computationally intensive and difficult to...
Article
Full-text available
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with...
Preprint
Full-text available
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with...
Conference Paper
Computing with high-dimensional (HD) vectors, also referred to as hypervectors, is a brain-inspired alternative to computing with scalars. Key properties of HD computing include a well-defined set of arithmetic operations on hypervectors, generality, scalability, robustness, fast learning, and ubiquitous parallel operations. HD computing is about m...
Article
Broadband current sensors are key components in numerous applications, including power conversion, motor control, and smart-metering. We present a compressive sensing (CS) current sensor system-on-chip (SoC) designed and fabricated in STM 0.16 $\mu \text{m}$ Bipolar-CMOS-DMOS technology. The SoC is capable of measuring currents with amplitudes of...
Preprint
Full-text available
Power consumption is a looming treat in today's computing progress. In scientific computing, a significant amount of power is spent in the communication and synchronization-related idle times. However, due to the time scale at which communication happens, transitioning in low power states during communication's idle times may introduce significant...
Preprint
Full-text available
Accurate, fast, and reliable multiclass classification of electroencephalography (EEG) signals is a challenging task towards the development of motor imagery brain-computer interface (MI-BCI) systems. We propose enhancements to different feature extractors, along with a support vector machine (SVM) classifier, to simultaneously improve classificati...
Preprint
Energy efficiency is of paramount importance for the sustainability of HPC systems. Energy consumption limits the peak performance of supercomputers and accounts for a large share of total cost of ownership. Consequently, system owners and final users have started exploring mechanisms to trade off performance for power consumption, for example thro...
Preprint
Full-text available
Energy efficiency, predictive maintenance and security are today key challenges in High Performance Computing (HPC). In order to be addressed, accurate monitoring of the power and performance, along with real-time analysis, are required. However, modern HPC systems still have limited power introspection capabilities, lacking fine-grain and accurate...
Article
Full-text available
In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural constraints impose hard limits on the main memory bandwidth, and push for software techniques which optimi...
Conference Paper
Electrical energy management is fundamental to optimize the generation and usage of power within a Smart Grid; and the measurement of parameters of electrical systems is crucial for achieving efficient control on electric loads. Most of the existing smart metering devices use voltage probes which are invasive, because they need a direct connection...
Conference Paper
The paper presents the design of a temperature monitoring system in a very harsh environment, such as Shallow Geothermal Systems (SGS), where the information of underground temperature is necessary to assess the thermal potential of the soil, for maximizing the efficiency of the SGS. The challenge is to get information at different depths (sometime...
Conference Paper
In heterogeneous CPU+GPU SoCs where a single DRAM is shared between both devices, concurrent memory accesses from both devices can lead to slowdowns due to memory interference. This prevents the deployment of real-time tasks, which need to be guaranteed to complete before a set deadline. However, freedom from interference can be guaranteed through...
Article
Full-text available
Wireless sensor nodes are traditionally powered by individual batteries, and a significant effort has been devoted to maximizing the lifetime of these devices. However, as the batteries can only store a finite amount of energy, the network is still doomed to die, and changing the batteries is not always possible. A promising solution is to enable e...
Article
Full-text available
In this paper we give a fresh look to Coarse Grained Reconfigurable Arrays (CGRAs) as ultra-low power accelerators for near-sensor processing. We present a general-purpose Integrated Programmable-Array accelerator (IPA) exploiting a novel architecture, execution model, and compilation flow for application mapping that can handle kernels containing...
Conference Paper
This work introduces an ultra-low-power visual sensor node coupling event-based binary acquisition with Binarized Neural Networks (BNNs) to deal with the stringent power requirements of always-on vision systems for IoT applications. By exploiting in-sensor mixed-signal processing, an ultra-low-power imager generates a sparse visual signal of binary...
Conference Paper
Designing and optimizing applications for energy-efficient High Performance Computing systems up to the Exascale era is an extremely challenging problem. This paper presents the toolbox developed in the ANTAREX European project for autotuning and adaptivity in energy efficient HPC systems. In particular, the modules of the ANTAREX toolbox are descr...
Conference Paper
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this contex...
Conference Paper
Detecting the amount of people occupying an environment is an important use case for surveillance in public spaces such as airports, stations and squares, but also for smaller environments such as classrooms (e.g. to track occupation of classrooms). Using visible imaging for this task is often suboptimal because 1) it potentially violates user priv...
Preprint
Full-text available
Flying in dynamic, urban, highly-populated environments represents an open problem in robotics. State-of-the-art (SoA) autonomous Unmanned Aerial Vehicles (UAVs) employ advanced computer vision techniques based on computationally expensive algorithms, such as Simultaneous Localization and Mapping (SLAM) or Convolutional Neural Networks (CNNs) to na...
Article
Supercomputer installed capacity worldwide increased for many years and further growth is expected in the future. The next goal for high performance computing (HPC) systems is reaching Exascale. The increase in computational power threatens to lead to unacceptable power demands, if future machines will be built using current technology. Therefore r...
Article
Computing with high-dimensional (HD) vectors, also referred to as $\textit{hypervectors}$, is a brain-inspired alternative to computing with scalars. Key properties of HD computing include a well-defined set of arithmetic operations on hypervectors, generality, scalability, robustness, fast learning, and ubiquitous parallel operations. HD computing...
Article
We report an always-on event-driven asynchronous wake-up circuit with trainable pattern recognition capabilities to duty-cycle power-constrained Internet-of-Things (IoT) sensor nodes. The wake-up circuit is based on a level-crossing analogto-digital converter (LC-ADC) employed as a feature-extraction block with automatic activity-sampling rate scal...
Article
We report VivoSoC, a system-on-chip realized in 130-nm CMOS for miniaturized medical instrumentation as used in mobile health devices or implantable telemetry systems for animal experiments. It features six neural stimulation channels and acquisition circuits for 9x electrode-based recordings, 4-channel/32-LED photoplethysmography (PPG), bioimpedan...