Davide De Caro

Davide De Caro
University of Naples Federico II | UNINA · Department of Electrical Engineering and Information Technology

About

103
Publications
20,465
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,582
Citations

Publications

Publications (103)
Article
Full-text available
In this paper we propose a novel approximate floating-point divider based on bidimensional linear approximation. In our approach, the mantissa quotient is seen as a function of the two input mantissas of the divider. The domain of this two-variable function is partitioned into $nx \times ny$ subregions, named tiles, where $nx, ny$ are chosen as...
Article
Full-text available
Floating-point division involves the computation of the ratio (1 $+$ Mx )/(1 $+$ My ), where Mx and My represents the mantissas of the input values. In this paper, we propose a new method for approximating this operation using a linear function of Mx , with coefficients that depend on My . The coefficients are calculated to minimize...
Article
Full-text available
Auscultation of heart sounds is important to perform cardiovascular assessment. External noises may limit heart sound perception. In addition, heart sound bandwidth is concentrated at very low frequencies, where the human ear has poor sensitivity. Therefore, the acoustic perception of the operator can be significantly improved by shifting the heart...
Article
Full-text available
Objective: The auscultatory technique is still considered the most accurate method for non-invasive blood pressure (NIBP) measurement, although its reliability depends on operator's skills. Various methods for automated Korotkoff sounds analysis have been proposed for reliable estimation of systolic (SBP) and diastolic (DBP) blood pressures. To th...
Preprint
Full-text available
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the pe...
Article
Full-text available
In this paper, we analyze the performances of an Enhanced Static Segment Multiplier (ESSM) when the inputs have both uniform and non-uniform distribution. The enhanced segmentation divides the multiplicands into a lower, a middle, and an upper segment. While the middle segment is placed at the center of the inputs in other implementations, we seek...
Article
Full-text available
In this paper we investigate a novel approximate multiply-and-accumulate (MAC) unit, that computes Y = A×B+C using static segmentation. The proposed architecture uses a unique carry-propagate adder and performs segmentation on the three operands A, B, and C, to reduce hardware cost. The circuit can be configured at design-time by two parameters. Th...
Article
Full-text available
In this paper a novel low-power approximate floating-point multiplier is presented. Since the mantissa computation is responsible for the largest part of the power consumption, we apply a novel approximation technique to mantissa multiplication, based on static segmentation. In our approach, the inputs of the mantissa multiplier are properly segmen...
Article
Full-text available
Approximate multipliers are used in error-tolerant applications, sacrificing the accuracy of results to minimize power or delay. In this paper we investigate approximate multipliers using static segmentation. In these circuits a set of $m$ contiguous bits (a segment of $m$ bits) is extracted from each of the two $n$ -bits operand, the two seg...
Article
Full-text available
This paper investigates the use of approximate fixed-width and static segment multipliers in the design of Delayed Least Mean Square (DLMS) adaptive filters based on the sign–magnitude representation for the error signal. The fixed-width approximation discards part of the partial product matrix and introduces a compensation function for minimizing...
Article
In this paper, a novel implementation is proposed for the Delayed LMS (DLMS) filter, able to reduce the power dissipation while preserving regime performances. The approach relies on the observation that the error signal is small in magnitude and oscillates around zero when the circuit is close to the convergence point. Therefore, the most signific...
Article
The increase of clock frequency in digital circuits exacerbates the electromagnetic interference (EMI) between devices. Spread-spectrum techniques reduce the electromagnetic noise lowering harmonic peaks of the clock signal by means of frequency modulation. In System-on-Chips (SoCs) another requirement in many applications is the coexistence of bot...
Article
Full-text available
Approximate multipliers attract a large interest in the scientific literature that proposes several circuits built with approximate 4-2 compressors. Due to the large number of proposed solutions, the designer who wishes to use an approximate 4-2 compressor is faced with the problem of selecting the right topology. In this paper, we present a compre...
Chapter
Precision-scalable techniques constitute an efficient solution to power consumption issues thanks to the possibility to adapt arithmetic components precision to required system-level accuracy with the aim to dynamically optimize power consumption. In this paper we propose a precision-scalable approach for the implementation of a Least Mean Square (...
Article
Full-text available
Link to the article: https://rdcu.be/bAXBS Adaptive filters based on least-mean-square (LMS) algorithm are used in several applications in virtue of their good steady-state performance, numerical stability, and acceptable computational complexity. The hardware implementation of LMS filters requires a massive number of multipliers that significantly...
Article
Video processing requires an increasing amount of buffered data. The paper proposes a multi-line buffer circuit that stores compressed data thus saving logic and power. The lossy compression algorithm provides the output stream with a fixed, decided by the user, delay from the input stream. Further, the amount of memory of the compressed buffer can...
Article
Full-text available
Quantum noise intrinsically limits the quality of fluoroscopic images. The lower is the X-ray dose the higher is the noise. Fluoroscopy video processing can enhance image quality and allows further patient’s dose lowering. This study aims to assess the performances achieved by a Noise Variance Conditioned Average (NVCA) spatio-temporal filter for r...
Article
Full-text available
In this paper we propose a new algorithm for real-time filtering of video sequences corrupted by Poisson noise. The algorithm provides effective denoising (in some cases overcoming the filtering performances of state-of-the-art techniques), is ideally suited for hardware implementation, and can be implemented on a small field-programmable gate arra...
Chapter
Approximate Computing (AC) waives error free computation to improve circuits performances. Adaptive Least-Mean-Squares (LMS) filters can benefit from AC, being both power hungry and inherently approximate. In this paper an approximate LMS filter is proposed, which is able to change, at runtime, the precision level by acting on an external quality k...
Article
The use of digital pulse-width modulators (DPWMs) as controllers for dc-dc converters is becoming more and more popular, due to the lower sensitivity to process variations, programmability, and easiness to translate complex control algorithm in digital architecture. DPWMs require high resolution to avoid limit cycling. Moreover, especially in high-...
Article
Antenna-coupling group delay limits the cancellation bandwidth of conventional self-interference cancellers (SICs), making it difficult to ensure isolation in both transmit (TX) and receive (RX) bands. Isolation over both bands is achieved in the dual-path receiver architecture proposed in this paper. The main path consists of a highly linear curre...
Article
Approximate computing is an emerging trend in digital design that trades off the requirement of exact computation for improved speed and power performance. This paper proposes novel approximate compressors and an algorithm to exploit them for the design of efficient approximate multipliers. By using the proposed approach, we have synthesized approx...
Article
Full-text available
The design and hardware implementation of a digital control system tailored to a hybrid transformer-based duplexer is proposed. Working at Nyquist sampling frequency, it finds the optimal transmit-receive isolation in about 150 μs even when modulated signals with high PAPR (16-QAM) are transmitted. A simple tracking algorithm, operating in backgrou...
Conference Paper
Full-text available
Approximate computing improves digital circuit performance by relaxing the requirement of performing exact calculations. In this paper, we investigate the use of approximate adders in the final stage of a carry save multiplier-accumulator (MAC), designed for image filtering application. We propose a design flow based on synthesis tools, starting fr...
Article
Full-text available
NAND-based digitally controlled delay-lines (DCDLs) are employed in several applications owing to their excellent linearity, good resolution and easy standard cell design. A glitch-free DCDL behavior is often a strict requirement [e.g. spread-spectrum clock generators (SSCG) and digitally controlled oscillators]. Existing glitch-free NAND-based DCD...
Article
The generation of complex signal sources is important for test and validation of electronic systems. With reference to noise sources, commercial systems usually provide white noise sources, while the scientific literature only recently proposed circuits that generate programmable colored noise. This paper proposes a filtering circuit and an algorit...
Article
Piecewise polynomial interpolation is a well-established technique for hardware function evaluation. The paper describes a novel technique to minimize polynomial coefficients wordlength with the aim of obtaining either exact or faithful rounding at a reduced hardware cost. The standard approaches employed in literature subdivide the design of piece...
Conference Paper
This paper presents the analysis and hardware implementation of a low-power control system for a frequency-division duplexing 3G receiver with a tunable on-chip duplexer based on a hybrid-transformer. The maximum transmit-receive isolation exceeds 60dB and is limited by the precision of the on-chip balancing impedance. An optimization algorithm fin...
Article
A variable latency adder (VLA) reduces average addition time by using speculation: the exact arithmetic function is replaced by an approximated one, that is faster and gives correct results most of the times. When speculation fails, an error detection and correction circuit gives the correct result in the following clock cycle. Previous papers inve...
Article
Full-text available
In ogni paese moderno che si rispetti, si sa che la formazione e i saperi sono determinanti per il consolidamento della sfera pubblica democratica, per la crescita reale e per l'incremento dell'occupazione. L'art. 9 della Costituzione italiana stabilisce: " La Repubblica promuove lo sviluppo della cultura e la ricerca scientifica e tecnica ". Quest...
Chapter
Spread spectrum clocking slowly sweeps clock frequency of a digital system to reduce the Electromagnetic Interference (EMI). In a digital system-on-chip there can be subsystems where clock spreading is not allowed. This paper analyzes the performances achievable by spread spectrum clocking when a constraint is imposed to easily synchronize clock mo...
Article
The paper proposes a SISO register circuit, functionally equivalent to a Shift Register, that is the optimal design choice when the input data have a reduced transition probability. The proposed circuit obtains improved performances by only storing the transitions of the input data, thus saving logic and power.
Article
Spread-spectrum clocking is an established approach to mitigate electromagnetic interference (EMI) of digital circuits, by intentionally sweeping the clock frequency. In this way the energy of each clock harmonic is spread over a larger bandwidth, thereby reducing the peak of the interfering spectrum. This paper describes an highly flexible all-dig...
Conference Paper
Speculation can enhance adders performance by making carry predictions. It consists in replacing the arithmetic function with a faster, approximated, one, giving correct results most of the time. An error detection stage flags the misprediction events, in such cases, a two-cycles error correction stage is used, constituting a variable latency specu...
Article
Variable latency adders have been recently proposed in literature. A variable latency adder employs speculation: the exact arithmetic function is replaced with an approximated one that is faster and gives the correct result most of the time, but not always. The approximated adder is augmented with an error detection network that asserts an error si...
Patent
Full-text available
A digital delay interpolator may include an array of multiplexers, each multiplexer configured to be input with first and second input voltages, one of the first and second input voltages being delayed in respect to the other, and receive a respective selection signal. The digital delay interpolator may include output lines respectively coupled to...
Article
This paper proposes a novel approach to build integer multiplication circuits based on speculation, a technique which performs a faster-but occasionally wrong-operation resorting to a multi-cycle error correction circuit only in the rare case of error. The proposed speculative multiplier uses a novel speculative carry-save reduction tree using thre...
Article
An electronic system for the real-time denoising of fluoroscopic images is proposed in this paper. Fluoroscopic devices use X-rays to obtain real-time moving images of patients and support many surgical interventions and a variety of diagnostic procedures. In order to avoid risks for the patient, X-ray intensity has to be kept acceptably low during...
Article
The hardware computation of the logarithm function is required in several applications, ranging from signal and image processing to telecommunication systems. This brief shows that most of previous proposed logarithmic converters, based on piecewise linear approximations, suffer from large errors when dealing with fixed-point input values with many...
Article
Squaring is an important arithmetic operation required in a multitude of applications. In this paper we present a truncated squarer that, with an n-bit input, produces its output on a number of bits that can be defined at design time in the [n,2n] range. For each configuration, some of the partial products are unformed, to reduce area and power, an...
Article
The Direct Digital Frequency Synthesizer (DDFS) is a critical component routinely implemented in communication or signal processing systems. The recent literature proposes various DDFS implementation techniques that, implemented by using state of the art Application Specific Integrated Circuits (ASIC) technologies, provide ever improving performanc...
Article
Recent all-digital spread-spectrum clock generators allow the synthesis of clock signals with a discontinuous frequency behavior. Following the electromagnetic interference (EMI) test guidelines, this paper investigates for the first time the peak-level reduction of the spectrum (modulation gain) achievable by using discontinuous frequency modulate...
Article
Fixed-width multipliers have two n-bits operands and produce an approximate n-bits results for their product. These multipliers discard part of the partial products matrix, to reduce hardware cost, and employ extra correction functions to reduce approximation error. While previous papers mainly focus on average error metrics (like mean-square error...
Article
This paper investigates the idea to construct Time-to-Digital Converter (TDC) circuits based on dynamic precharged NORA delay elements. A self-charging technique is proposed in order to accommodate the dynamic delay elements in a ring-oscillator like structure. The employ of dynamic logic allows to reduce the TDC resolution with respect to previous...
Article
Full-text available
Circuits and systems able to process high quality video in real time are fundamental in nowadays imaging systems. The circuit proposed in the paper, aimed at the robust identification of the background in video streams, implements the improved formulation of the Gaussian Mixture Model (GMM) algorithm that is included in the OpenCV library. An innov...
Article
The recently proposed NAND-based digitally controlled delay-lines (DCDL) present a glitching problem which may limit their employ in many applications. This paper presents a glitch-free NAND-based DCDL which overcame this limitation by opening the employ of NAND-based DCDLs in a wide range of applications. The proposed NAND-based DCDL maintains the...
Article
This paper describes the implementation of a novel high speed differential resistor ladder. In this paper it is shown that the novel ladder yields, theoretically, up to a sixteen fold reduction of the propagation delay with respect to the conventional differential ladder. In order to ease the design process, an accurate analytical model for the lad...
Article
In this paper, we present a transmission-line-based model developed to accurately describe the power and ground-line interconnections of modern digital ASICs. The proposed model employs transmission lines as the core component to properly describe both the capacitive and inductive behavior of the metal lines. In addition, the nonlinear frequency de...
Article
In this paper we discuss the efficient implementation of pseudochaotic piecewise linear maps with high digitization accuracies, taking the R'enyi chaotic map as a reference. The proposed digital architectures are based on a novel algorithmic approach that uses carry save adders for the nonlinear arithmetic modular calculations arising when computin...
Article
We present a novel Time-to-Digital (TDC) converter for physics experiments. Proposed TDC is based on a synchronous counter and an asynchronous fine interpolator. The fine part of the measurement is obtained using NORA inverters that provide improved resolution. A prototype IC was fabricated in 180 nm CMOS technology. Experimental measurements show...
Article
This paper investigates a novel direct digital fre- quency synthesizer architecture, based on piecewise linear approximation with segments of nonuniform length. The new approach allows reducing the total number of segments with respect to the well-known uniform segmentation. In this way the size of the coefficient ROM is also reduced with beneficia...
Article
The hardware computation of the logarithm function is required in a multitude of applications. This brief investigates logarithmic converters based on piecewise linear approximations. This brief presents a rigorous technique, based on mixed-integer linear programming, to obtain optimal coefficients' values, which minimize the maximum relative appro...
Article
This paper focuses on fixed-width multipliers with linear compensation function by investigating in detail the effect of coefficients quantization. New fixed-width multiplier topologies, with different accuracy versus hardware complexity trade-off, are obtained by varying the quantization scheme. Two topologies are in particular selected as the mos...
Article
A novel technique for designing piecewise-polynomial interpolators for hardware implementation of elementary functions is investigated in this paper. In the proposed approach, the interval where the function is approximated is subdivided in equal length segments and two adjacent segments are grouped in a segment pair. Suitable constraints are then...
Conference Paper
Many multimedia and DSP applications require fixed-width multipliers, in which input data and output results have the same bit width. In this paper we investigate fixed-width multipliers where one of the input operand is a constant, encoded using canonic signed digit (CSD) representation. This is a very important case in many practical applications...
Article
Truncated multipliers compute the n most-significant bits of the n × n bits product. This paper focuses on variable-correction truncated multipliers, where some partial-products are discarded, to reduce complexity, and a suitable compensation function is added to partly compensate the introduced error. The optimal compensation function, that minimi...
Article
Spread spectrum clocking is an effective solution to reduce the electromagnetic interference produced by digital chips, using a clock signal with a frequency that is intentionally swept (frequency modulated) within a certain frequency range, with a predefined modulation profile. We present the implementation of an all-digital spread spectrum clock...
Conference Paper
A truncated binary squarer is a squarer with a n bit input that produces a n bit output. The proposed design minimizes the mean square error of the squarer and results in a very simple and fast circuital implementation. The squarer, compared against state of the art circuits, provides a reduction of the mean square error ranging from 20% to 5%. At...
Conference Paper
This paper describes the implementation of a novel high-speed differential resistor ladder suited for A/D converters. The novel ladder yields, theoretically, up to a sixteen-fold reduction of the propagation delay with respect to the conventional differential ladder. Simulation results, for a BiCMOS 0.25μm technology, show that the novel ladder res...
Article
An high-speed special function unit (SFU) is presented in this paper. The system supports the single-precision IEEE-754 floating-point standard and implements faithfully rounded reciprocal, square root, reciprocal square root, logarithm, and exponential functions. The functions are approximated by using a novel constrained piecewise quadratic inter...
Article
This paper describes a novel architecture for digital synthesizer/mixer (DSM). The operation performed by a DSM corresponds to a rotation of the input vector in the complex plane. The proposed architecture divides this rotation into three subrotations. The first one uses a few CORDIC stages, in which the rotation directions are in parallel computed...
Article
A novel architecture to realize the conversion of rectangular to polar coordinates is presented in this paper. The proposed technique for phase calculation uses a logarithmic number system and does not require any multiplications, but only a few small tables and a few multi-operand additions. The modulus is computed by a constant multiplier, a look...
Conference Paper
The paper presents a new technique to design signed and unsigned truncated multipliers. Simple formulae are developed in the paper to describe the truncated multiplier with minimum mean square error for every inputspsila bit-width. With respect to previously proposed techniques, our analytical approach is more general and improves the accuracy of t...
Article
The use of the multipartite table methods (MTMs) to implement high-performance direct digital frequency synthesizers (DDFSs) is investigated in this paper. A closed-form expressions for the spurious-free dynamic range (SFDR) is obtained when a single table of offset (TO) is used in the multipartite approximation. In this case, the optimal design th...
Conference Paper
This paper presents a novel technique for designing piecewise polynomial interpolators for hardware implementation of elementary functions. In the proposed approach, we impose special constraints between polynomial coefficients of adjacent segments. This allows to significantly reduce look-up table size with respect to standard, unconstrained piece...
Article
Full-text available
A wideband frequency synthesizer architecture is presented. The proposed topology employs a direct digital frequency synthesizer (DDFS) to control the output frequency of an offset-PLL. In this way, the synthesizer features a very fine frequency resolution, 24 Hz, as in delta-sigma fractional-N PLLs, but without being affected by the quantization-i...
Conference Paper
A special function unit, able to compute square root, reciprocal square root, logarithm and exponential functions is presented in this paper. The system supports single precision IEEE-754 floating-point standard and uses a novel constrained piecewise quadratic interpolation technique to approximate the implemented functions. The proposed approach a...
Article
In this paper, a new GF(2<sup>m</sup>) multiplier for standard-basis representation is developed. The proposed multiplier implements the Mastrovito multiplication scheme and can be designed for every field GF(2<sup>m</sup>). A minimum-area implementation of the first block of Mastrovito multiplier and a high-speed delay-driven tree architecture for...
Conference Paper
The paper introduces a new technique to design signed and unsigned n x n bit fixed-width multipliers with minimum mean square error. In previous papers the error minimization of fixed-width multipliers was achieved through exhaustive searches, and is practically computable only for small n values. This is the first paper in which the error compensa...
Conference Paper
In the paper a new GF(2<sup>m</sup>) multiplier for standard basis representation is developed. Proposed multiplier can be designed for every field GF(2<sup>m</sup>). Multiplier complexity and delay are analytically evaluated for many polynomial classes. Timing and area occupation performances of the proposed multiplier are compared with those of p...
Article
The paper presents a detailed description of a direct digital frequency synthesizer (DDFS) based on a Multipartite Table Method (MTM) which is a salient lookup table compression technique. A novel algorithm to find the optimal MTM decomposition which minimizes the ROM size while archiving a target spurious free dynamic range (SFDR) is presented in...
Article
The paper describes the implementation of a 380 MHz, 13 bit, direct digital synthesizer/mixer IC in 0.25mum CMOS technology. The circuit employs an innovative architecture which divides the pi/4 rotation operation required in the quadrature synthesizer/mixers, in three rotations. The first two rotations are implemented by using a CORDIC datapath co...
Conference Paper
Conference code: 72346, Export Date: 26 April 2013, Source: Scopus, Sponsors: European Space Agency (ESA); Swedish National Space Board (SNSB); Swedish Space Corporation (SSC)
Article
This paper presents a detailed description of direct digital frequency synthesizers (DDFS) using an optimized piecewise linear approximation for phase to sine mapping, named dual-slope. The dual-slope technique allows reducing ROM size with respect to previously proposed piecewise-linear approximation approaches, with beneficial effects on system p...
Article
A new sense-amplifier-based flip-flop is presented. The output latch of the proposed circuit can be considered as an hybrid solution between the standard NAND-based set/reset latch and the NC-/sup 2/MOS approach. The proposed flip-flop provides ratioless design, reduced short-circuit power dissipation, and glitch-free operation. The simulation resu...