
Dirk Stroobandt- Ghent University
Dirk Stroobandt
- Ghent University
About
324
Publications
45,907
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,569
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (324)
Moore's Law has been guiding the development of the VLSI industry for the past half-century. This law underscores fabrication technology's crucial role in enhancing chip performance. However, with technology nearing physical limitations, we envisage that there is still significant potential for optimizing interconnection complexity in VLSI design....
In the rapidly evolving domain of Photonic Integrated Circuits, reconfigurability is making strides through tunable waveguide elements, facilitating ‘general-purpose’ programmable waveguide grids. Routing in modern programmable photonic networks is challenging due to the numerous possibilities that exist for assigning photonic circuits in the grid....
Routing is a crucial step in Field Programmable Gate Array (FPGA) physical design, as it determines the routes of signals in the circuit, which impacts the design implementation quality significantly. It can be very time-consuming to successfully route all the signals of large circuits that utilize many FPGA resources. Attempts have been made to sh...
In this work, a novel method for in-circuit debugging on FPGAs is introduced that allows the insertion of low-overhead debugging infrastructure by exploiting the technique of parameterized configurations. This allows the parameterization of the LUTs and the routing infrastructure to create a virtual network of debugging multiplexers. It aims to fac...
Reassuring fault tolerance in computing systems is an important problem in high-reliability applications. With the interest in commercial SRAM-based FPGAs in radiation environments, it is beneficial to provide runtime reconfigurable recovery from a failure. In this paper a virtual coarse-grained reconfigurable architecture is proposed, with an embe...
Field Programmable Gate Arrays (FPGAs) gain popularity as higher-level tools evolve to deliver the benefits of re-programmable silicon to engineers and scientists at all levels of expertise. In order to use FPGAs efficiently, new CAD tools and modern architectures are needed for the growing demands of heterogeneous computing paradigms. Overlay arch...
Placement is a crucial step in the FPGA design tool flow, as it determines the overall performance of the circuits. Unfortunately, it is a time-consuming task. Analytical placers have been shown to be the most time-efficient while retaining good quality. One way of implementing analytical placement is to use an iterative technique that consists of...
As image and video resolution continues to increase, compression plays a vital role in the successful transmission of video and image data over a limited bandwidth channel. Computation complexity, as well as the utilization of resources and power, keep increasing when we move from the H264 codec to the H265 codec. Optimizations in each particular b...
With the aggressive scaling of the VLSI technology, Networks-on-Chip (NoCs) are becoming more susceptible to faults. Therefore, designing reliable and efficient NoCs is of significant importance. The rerouting approach which is employed in most of the fault-tolerant methods causes the network performance to degrade considerably due to taking longer...
Current HLS tools for the automatic design of computing hardware perform excellently for the synthesis of computation kernels, but they often do not optimize memory bandwidth. As accessing memory is a bottleneck in many algorithms, the performance of the generated circuit could benefit substantially from memory access optimization. In this paper, w...
With the aggressive scaling of the VLSI technology, Networks-on-Chip (NoCs) are becoming more susceptible to faults. Therefore, designing reliable and efficient routing methods is of significant importance. Most of the existing fault-tolerant techniques rely on rerouting solutions which may degrade the network performance drastically not only by ta...
Run-time reconfiguration in FPGAs is an important feature that offers design flexibility under low-cost silicon area and power budgets, at the cost of reconfiguration overhead. The reconfiguration time overhead produced by the conventional configuration ports (such as ICAP) is too high for the reconfiguration technology to be embraced as a standard...
Generating a configuration for an FPGA starting from a high level description of a design is a time consuming task. The resulting configuration should have a high quality so that the FPGA resources are used in an efficient way while being able to run at high clock frequencies and having a low power consumption. In this work we present MultiPart, a...
Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability and result in low development costs. They enable the ease of use specifically in reconfigurable computing applications. The smaller cost of compilation and reduced reconfiguration overhead enables them to become attractive platforms for accelerating high-performance computi...
This paper presents the reconfigurable hardware design of an Encryptor and a Decryptor of the 16-, 32-, and 64-bit versions of π-Cipher called "Reco-Pi", one of candidate designs for the Competition for Authenticated Encryption: Security, Applicability, and Robustness. π-Cipher is a nonce-based authenticated encryption engine with associated data....
Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability and result in low development
costs. They enable the ease of use specifically in reconfigurable computing applications. The smaller cost of compilation and reduced reconfiguration overhead enables them to become attractive platforms for accelerating high-performance computi...
FPGA design compilation takes too much time to allow efficient design turnaround times. The largest runtime consuming steps of the compilation are placement and routing. To speed up the FPGA placement process, analytical placement techniques have become more popular in the last decade. Analytical techniques produce a placement in two steps, a place...
It takes a long time to generate a configuration for an FPGA starting from a description of a digital circuit in a hardware design language. This configuration should have a high quality so that the FPGA resources are used in an efficient way with the maximum clock frequency and minimizing the power consumption. In this work we present two new pack...
This paper presents an improved hardware imple-
mentation of a 16-bit ARX (Add, Rotate, and Xor) engine for
one of the CAESAR second-round competition candidates,
π
-
Cipher, implemented on an FPGA.
π
-Cipher is a nonce-based
authenticated encryption cipher with associated data. The
security of the
π
-Cipher relies on an ARX based permutation
funct...
Abstract—Field Programmable Gate Arrays (FPGAs) have proven their potential in accelerating High Performance Computing (HPC) Applications. Conventionally such accelerators predominantly use, FPGAs that contain fine-grained elements such as LookUp Tables (LUTs), Switch Blocks (SB) and Connection Blocks (CB) as basic programmable logic blocks. Howeve...
This paper presents an improved hardware implementation of a 16-bit ARX (Add, Rotate, and Xor) engine for one of the CAESAR second-round competition candidates, $\pi$-Cipher, implemented on an FPGA. Pi-Cipher is a nonce-based authenticated encryption cipher with associated data. The security of the Pi-Cipher relies on an ARX based permutation funct...
Dynamic Circuit Specialization is used to optimize the implementation of a parameterized application on an FPGA. Instead of implementing the parameters as regular inputs, in the DCS approach these inputs are implemented as constants. When the parameter values change, the design is reoptimized for the new constant values by reconfiguring the FPGA. T...
Dynamic Circuit Specialization (DCS) is a technique for optimized FPGA implementation and is built on top of Partial Reconfiguration (PR). Dynamic Partial Reconfiguration (DPR) provides an opportunity to share the silicon area between different Partially Reconfigurable Modules (PRMs) and therefore results in smaller and faster designs that potentia...
Field Programmable Gate Arrays (FPGAs) belong to a class of semiconductor devices whose hardware can be changed according to our needs. The configuration data (bitstreams) of an FPGA define the functionality of the FPGA. Therefore, a user can design the hardware and change it by modifying the bitstreams for a given set of requirements. One way of d...
The Through-Silicon Via (TSV) technology has led to major breakthroughs in 3D stacking by providing higher speed and bandwidth, as well as lower power dissipation for the inter-layer communication. However, the current TSV fabrication suffers from a considerable area footprint and yield loss. Thus, it is necessary to restrict the number of TSVs in...
It is common for large hardware designs to have a number of registers or memories of which the contents have to be changed very seldom, e.g. only at startup. The conventional way of accessing these memories is using a low-speed memory bus. This bus uses valuable hardware resources, introduces long, global connections and contributes to routing cong...
Parameterised configurations are FPGA configuration bitstreams in which the bits are defined as functions of user-defined parameters. From a parameterised configuration, it is possible to quickly and efficiently derive specialised, regular configuration bitstreams by evaluating these functions. The specialised bitstreams have different properties a...
An FPGA implementation requires a significant effort of the hardware designer, who optimizes FPGA designs by going through many time-consuming CAD flow iterations. These iterations provide two types of feedback: (1) the FPGA performance and (2) the identification of the parts having the highest impact on the FPGA performance. Both depend on the wir...
Dynamic Circuit Specialisation (DCS) is a technique that uses the reconfigurability of an FPGA to optimise a circuit during run-time, thus achieving higher performance and lower resource cost. However, run-time reconfiguration causes transitional effects that form an important problem for DCS. Because of these, the DCS circuit cannot be used while...
The overall performance of Network-on-Chip (NoC) is strongly affected by the efficiency of the on-chip routing algorithm. Among the factors associated with the design of a high-performance routing method, adaptivity is an important one. Moreover, deadlock-and live lock-freedom are necessary for a functional routing method. Despite the advantages th...
The most important step in the final testing of fabricated ASICs or the functional testing of ASIC and FPGA designs is the generation of a complete test set that is able to find the possible errors in the design. Automatic Test Pattern Generation (ATPG) is often done by fault simulation which is very time-consuming. Speed-ups in this process can be...
A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using dynamic partial reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. This can save consi...
Dynamic Circuit Specialization (DCS) optimizes a Field-Programmable Gate Array (FPGA) design by assuming a set of its input signals are constant for a reasonable amount of time, leading to a smaller and faster FPGA circuit. When the signals actually change, a new circuit is loaded into the FPGA through runtime reconfiguration. The signals the desig...
Using dynamic partial reconfiguration (DPR), several circuits can be time-multiplexed on the same FPGA region, saving considerable area compared to an implementation without DPR. However, the long reconfiguration time to switch between circuits remains a significant problem. In this work we show that it is possible to significantly reduce this over...
We propose a new kind of FPGA architecture with a routing network that not only provides interconnections between the functional blocks but also performs some logic operation. More specifically we replaced the routing multiplexer node in the conventional architecture with an element that can be used as both AND gate and multiplexer. A conventional...
Networks-on-Chip (NoCs) are becoming more susceptible to faults due to the increasing density in the VLSI circuits. As a result, designing reliable and efficient routing methods is highly desirable. Most of the existing fault-tolerant routing techniques use nonminimal paths to reroute the packets around the faulty regions. Using these approaches, t...
This paper proposes the use of parameterised FPGA configurations for a new test set generation approach. The time-consuming problem of test set generation aims at finding the right input values to fully test an ASIC design. Since well-known methods for test set generation such as fault simulation techniques have become impractical to use due to the...
Dynamic Circuit Specialisation (DCS) is a method that exploits the reconfigurability of modern FPGAs to allow the specialisation of FPGA circuits at run-time. Currently, it is only explored as part of Register-transfer level design. However, at the Register-transfer level (RTL), a large part of the design is already locked in. Therefore, maximally...
Dynamic Circuit Specialization (DCS) is an optimization technique used for implementing a parameterized application on an FPGA. The application is said to be parameterized when some of its inputs, called parameters, are infrequently changing compared to the other inputs. Instead of implementing these parameter inputs as regular inputs, in the DCS a...
Dynamic Circuit Specialization (DCS) is a technique used to
optimize FPGA applications when some of the inputs, called
parameters, are infrequently changing compared to other in-
puts. For every change of parameter input values, a special-
ized FPGA configuration is generated during run time and
the FPGA is reconfigured with a specialized bitstream...
The amalgamation of 3D VLSI technology and Networks-on-Chip (NoCs) offers a promising architectural platform for the future Multi-Processor Systems-on-Chip (MPSoCs). Since multicast communication is frequently exploited in such systems, it is highly desirable to design NoC-based routing methods that support multicast. In this paper, a highly adapti...
Even though FPGAs are becoming more and more popular as they are used in many different scenarios like communications and HPC, the steep learning curve needed to work with this technology is still the major limiting factor to their full success. Many works proposed to mitigate this problem by creating a companion of tools to support the designer du...
The incorporation of the third dimension in the design of Networks-on-Chip (NoCs) provides a major performance improvement for Chip Multi-Processors (CMPs). Since multicast communication is necessary for parallelization, it is of significant importance to design routing methods that support multicast. The partitioning strategy has a major impact on...
While fine-grain, reconfigurable devices have been available for years, they are mostly used in a fixed functionality, “asic-replacement” manner. To exploit opportunities for flexible and adaptable run-time exploitation of fine grain reconfigurable resources (as implemented currently in dynamic, partial reconfiguration), better tool support is need...
It is common for large hardware designs to have a number of registers or memories of which the contents have to be changed very seldom, e.g. only at startup. The conventional way of accessing these memories is using a low-speed memory bus. This bus uses valuable hardware resources, introduces long, global connections and contributes to routing cong...
The overall performance of Multi-Processor System-on-Chip (MPSoC) platforms depends highly on the efficient communication among their cores in the Network-on-Chip (NoC). Routing algorithms are responsible for the on-chip communication and traffic distribution through the network. Hence, designing efficient and high-performance routing algorithms is...
Dynamic partial reconfiguration of FPGAs enables the dynamic specialization of the circuit for the runtime needs of the application. Previously a tool flow, called the TLUT tool flow, was developed to aid the designer in applying dynamic circuit specialization (DCS) for their designs. The TLUT tool flow generates an implementation in which the look...
A Software-Defined Radio (SDR) system, is a radio communication system where components that have been typically implemented in hardware are now implemented using software. In this paper we describe and compare two approaches to map this software onto a hardware platform. One runs the software on an ASIP, the other one uses High-Level Synthesis to...
Current High-Level Synthesis (HLS) tools perform excellently for the synthesis of computation kernels, but they often don't optimize memory bandwidth. As memory access is a bottleneck in many algorithms, the performance of the generated circuit will benefit substantially from memory access optimization. In this paper we present an automated method...
Future computing systems will require dedicated accelerators to achieve high-performance. The mini-symposium ParaFPGA explores parallel computing with FPGAs as an interesting avenue to reduce the gap between the architecture and the application. Topics discussed are the power of functional and dataflow languages, the performance of high-level synth...
The FPGA's interconnection network not only requires the larger portion of the total silicon area in comparison to the logic available on the FPGA, it also contributes to the majority of the delay and power consumption. Therefore it is essential that routing algorithms are as efficient as possible. In this work the connection router is introduced....
Optimization of multiprocessor systems relies heavily on the efficient design of on-chip routing algorithms. Adaptive routing appears to have an extremely significant role in the performance of the Networks-on-Chip. In this paper, a deadlock-free and highly adaptive minimal routing method (HOE) is proposed. Although the Hamiltonian Adaptive Multica...
On-chip communication appears to have an extremely significant role in taking advantage of the inherent parallelization offered by the MPSoCs. If interconnection networks are to be used efficiently in such platforms, designing high-performance routing algorithms is inevitable. In this paper, a deadlock-free and highly adaptive multicast/unicast rou...
Fine grained Field Programmable Gate Arrays (FPGA) are complex to program and therefore suffer from high development costs. To solve this problem, Virtual Coarse Grained Reconfigurable Arrays (Virtual CGRA), or CGRAs implemented on FPGAs, have been proposed. Conventional implementations of VCGRAs use functional FPGA resources, such as LookUp Tables...
Using Dynamic Partial Reconfiguration (DPR) of FPGAs, several circuits can be time-multiplexed on the same chip region, saving considerable area. However, the long reconfiguration time when switching between circuits remains a large problem with DPR. In this paper we show it is possible to significantly reduce reconfiguration time when the number o...
Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows comple...
A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration (RTR) of an FPGA, all the modes can be time-multiplexed on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conv...
With the continuously increasing number of cores, on-chip communication has gained a significant role in taking advantage of the multicore chips. Therefore, designing efficient routing algorithms is highly desirable to increase the performance of the Networks-on-Chip (NoC). In comparison with deterministic routing methods, adaptive routing offers b...
Dynamic circuit specialization (DCS) is a technique used to implement FPGA applications where some of the input data, called parameters, change slowly compared to other inputs. Each time the parameter values change, the FPGA is reconfigured by a configuration that is specialized for those new parameter values. This specialized configuration is much...
Extending product functionality and lifetime requires constant addition of new features to satisfy the growing customer needs and the evolving market and technology trends. software component adaptivity is straightforward but not enough: recent products include hardware accelerators for reasons of performance and power efficiency that also need to...
It is known that an often used implementation method for regular expressions that uses a combination of counters and nondeterministic finite automatons is incorrect for certain regular expressions. Determining which expressions can be correctly implemented with this method has proven nontrivial and has previously been done without proof. Presented...
A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional ru...
We propose a bidirectional truncated recurrent neural network architecture for speech denoising. Recent work showed that deep recurrent neural networks perform well at speech denoising tasks and outperform feed forward architectures [1]. However, recurrent neural networks are difficult to train and their simulation does not allow for much paralleli...
During the last few years, there is an increasing interest in mixing software and hardware to serve efficiently different applications. This is due to the heterogeneity characterizing the tasks of an application which require the presence of resources from both worlds, software and hardware. Controlling effectively these resources through an integr...
The FASTER project aims to ease the definition, implementation and use of dynamically changing hardware systems. Our motivation stems from the promise reconfigurable systems hold for achieving better performance and extending product functionality and lifetime via the addition of new features that work at hardware speed. This is a clear advantage o...