
Rastislav Struharik- Professor
- Professor (Full) at University of Novi Sad
Rastislav Struharik
- Professor
- Professor (Full) at University of Novi Sad
About
45
Publications
5,216
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
259
Citations
Introduction
Rastislav Struharik, full professor, currently works at the Department of Power, Electronics and Telecommunications Engineering, University of Novi Sad. Rastislav does research in Computer Architecture, Artificial Intelligence and Artificial Neural Networks. His current project is 'Hardware Acceleration of Machine Learning Algorithms.'
Current institution
Additional affiliations
October 2016 - present
October 2015 - present
October 2014 - present
Education
October 2005 - December 2009
October 1999 - May 2005
October 1993 - January 1999
Publications
Publications (45)
Object detection is a popular image-processing technique, widely used in numerous applications for detecting and locating objects in images or videos. While being one of the fastest algorithms for object detection, Single-shot Multibox Detection (SSD) networks are also computationally very demanding, which limits their usage in real-time edge appli...
This study presents a universal reconfigurable hardware accelerator for efficient processing of sparse decision trees, artificial neural networks and support vector machines. The main idea is to develop a hardware accelerator that will be able to directly process sparse machine learning models, resulting in shorter inference times and lower power c...
Paper proposes a two-step Convolutional Neural Network (CNN) pruning algorithm and resource-efficient Field-programmable gate array (FPGA) CNN accelerator named “Argus”. The proposed CNN pruning algorithm first combines similar kernels into clusters, which are then pruned using the same regular pruning pattern. The pruning algorithm is carefully ta...
Data movement between the Convolutional Neural Network (CNN) accelerators and off-chip memory is critical concerning the overall power consumption. Minimizing power consumption is particularly important for low power embedded applications. Specific CNN computes patterns offer a possibility of significant data reuse, leading to the idea of using spe...
In this paper, a hardware accelerator for sparse support vector machines (SVM) is proposed. We believe that the proposed accelerator is the first accelerator of this kind. The accelerator is designed for use in field programmable gate arrays (FPGA) systems. Additionally, a novel algorithm for the pruning of SVM models is developed. The pruned SVM m...
In this paper, we propose a novel Convolutional Neural Network hardware accelerator called CoNNA, capable of accelerating pruned, quantized CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the compressed CNN acceleration, being able to accelerate all layer types commonly found in contemporary CNNs. CoNNA is designed...
This paper presents a hardware accelerator for sparse decision trees intended for FPGA applications. To the best of authors’ knowledge, this is the first accelerator of this type. Beside the hardware accelerator itself, a novel algorithm for induction of sparse decision trees is also presented. Sparse decision trees can be attractive because they r...
In this paper we propose a novel Convolutional Neural Network hardware accelerator, called CoNNA, capable of accelerating pruned, quantized, CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the full, compressed CNN acceleration, being able to accelerate all layer types commonly found in contemporary CNNs. CoNNA is d...
Convolutional Neural Networks (CNNs) are becoming a fundamental tool for machine learning. High performance and energy efficiency are of great importance for deployments of CNNs in many embedded applications. Energy consumption during CNN processing is dominated by memory access and since large networks do not fit in on-chip storage, they require e...
In this paper a system for hardware-aided induction of decision tree ensembles using the evolutionary approach (Decision Tree Ensemble Evolution co-Processor—DTEEP) is proposed. DTEEP is used for hardware acceleration of the fitness evaluation, since it is shown that most of the ensemble inference time is spent on this task. The DTEEP co-processor...
In this paper we propose a novel CNN hardware accelerator, called AIScale, capable of accelerating convolutional, pooling, fully-connected and adding CNN layers. In contrast to most existing solutions, AIScale offers a complete solution to the full CNN acceleration. AIScale is designed as a coarse-grained reconfigurable architecture, which uses rap...
In fields like embedded vision, where algorithms are computationally expensive, hardware accelerators play a major role in high throughput applications. These accelerators could be implemented as hardwired IP cores or Application Specific Instruction-set Processors (ASIPs). While hardwired solutions often provide the best possible performance, they...
Algorithms for data encryption are one of the most important parts of modern communication systems. In this paper the results of hardware implementation of AES256 and TDES algorithms are presented. AES256 and TDES are implemented as an IP core with AXI interface because of constant growth of data transfer requirements in modern embedded systems, in...
In this paper a co-processor for the hardware aided decision tree induction using evolutionary approach (EFTIP) is proposed. EFTIP is used for hardware acceleration of the fitness evaluation task since this task is proven in the paper to be the execution time bottleneck. The EFTIP co-processor can significantly improve the execution time of a novel...
This paper presents a novel algorithm for induction of full oblique decision trees (EFTI). Proposed algorithm is based on special, single individual evolutionary algorithm, which evolves full decision tree by modifying its structure and node coefficients during the evolution process. EFTI algorithm is particularly well suited to be used in embedded...
In this paper a universal reconfigurable computing architecture for hardware implementation of homogeneous and heterogeneous ensemble classifiers composed from decision trees (DTs), artificial neural networks (ANNs), and support vector machines (SVMs) is proposed. The following types of ensemble classifiers have been implemented in FPGA using propo...
IP cores for direct and inverse discrete cosine transformation are important part of compression core implementation. In this paper the results for the implementation of two dimensional binary discrete cosine transformation (2D binDCT) and two dimensional binary discrete cosine inverse transformation (2D binIDCT) algorithm which are discrete cosine...
This paper presents four different architectures for the hardware acceleration of axis-parallel, oblique and non-linear decision tree ensemble classifier systems. Hardware architectures for the implementation of a number of ensemble combination rules are also presented. The proposed architectures are optimized for size, making them particularly int...
This paper proposes four different hardware architectures for parallel implementation of decision trees forming an ensemble classifier are presented. Proposed architectures can accelerate ensemble classifiers composed of axis-parallel, oblique and nonlinear decision tree (DTs). Hardware architectures for the implementation of a number of combinatio...
This paper proposes universal coarse-grained reconfigurable computing architecture for hardware implementation of decision trees (DTs), artificial neural networks (ANNs), and support vector machines (SVMs), suitable for both field programmable gate arrays (FPGA) and application specific integrated circuits (ASICs) implementation. Using this univers...
In this paper an application of evolutionary
algorithm to oblique decision tree inference is presented. In
the core of new decision tree inducing algorithm is the
specific evolutionary algorithm called HereBoy.
Performance of proposed HBDT algorithm is studied and
compared with eight existing decision tree building
algorithms using standard benchma...
This paper presents several IP cores for the hardware evolution of ensembles comprised from oblique or non-linear decision trees. These cores can be implemented using FPGA or ASIC technology. Results of experiments obtained using 29 datasets from the standard UCI Machine Learning Repository database suggest that the FPGA implementations offer signi...
We propose a new digital reconfigurable architecture for the implementation of machine learning classifiers. The architecture can be configured to implement a neural network, decision tree or a support vector machine type of classifier.
In this paper, several hardware architectures for the realization of ensembles of axis-parallel, oblique and nonlinear decision trees (DTs) are presented. Hardware architectures for the implementation of a number of ensemble combination rules are also presented. These architectures are universal and can be used to combine predictions from any type...
In paper is presented method for generation of control signals sequences, allowing one to compute an arbitrary n-input 1-output Boolean function, using only two working memristors. Described approach is based on the use of recursive Boolean formula, which brings the way for implementation of Boolean function over functionaly complete basis {imply,...
This paper introduces a novel programmable reconfigurable architecture, called vCell Matrix. Proposed architecture is based on the Cell Matrix architecture with several important modifications that allow simpler and fast reconfiguration. The new architecture is optimized for the implementation using Xilinx FPGA devices. A special embedded system, b...
This paper proposes several IP cores for the hardware implementation of the complete decision tree inference algorithm. Evolving decision trees in hardware is motivated by the significant improvement in the evolution time compared to the time needed for software evolution. Several architectures for the hardware evolution of single oblique or nonlin...
We propose a new digital architecture for a SVM classification. The architecture uses a kernel which is suited for an implementation as a digital architecture in embedded systems. It is then tested on a channel equalization problem where real-time performances are important and hardware implementation of the classification is needed.
In this paper several hardware implementations of decision trees (axis-parallel, oblique and non-linear)
based on the concept of universal node and sequence of universal nodes are presented. Proposed hardware
architectures are suitable for the implementation in both Field Programmable Gate Arrays (FPGA) and Application Specific Integrated Circuits...
This paper, according to the best of our knowledge, provides the very first solution to the hardware implementation of the complete decision tree inference algorithm. Evolving decision trees in hardware is motivated by a significant improvement in the evolution time compared to the time needed for software evolution and efficient use of decision tr...
Several soft intellectual property (IP) core implementations of decision trees (axis-parallel, oblique and nonlinear) based on the concept of universal node (UN) and sequence of UNs are presented. Proposed IP cores are suitable for implementation in both field programmable gate arrays and application specific integrated circuits. Deveoped IP cores...
In this paper an algorithm for construction of neural networks from decision trees is presented. First decision tree is constructed using some standard algorithm, then equivalent set of rules is extracted from that tree. Neural network is than formed using that set of rules. This neural network is than used as a basis for further learning that will...
In this paper a novel voltage-controlled memristor model accounting for exponential ionic drift with respect to memristor voltage and nonlinear ionic drift with respect to barrier position is proposed. The model is implemented in Simscape™ language and the functionality of the model is tested using simulations of memristive digital logic and adapti...
This paper presents a survey of the work done so far in designing hybrid CMOS/nanoelectronic computing architectures. Although there have been previous attempts in making such a survey [8-9], they seem to be incomplete at current time, mostly due to the fact that three additional years have passed since their publishing and several new computing ar...
In this paper design of an Huffman decoder FPGA core that is part of an motion JPEG system is presented. Core is designed and verified using VHDL hardware description language. It is implemented and tested on an Virtex2 FPGA platform from Xilinx Inc.
In this paper architectures for the 2D DCT/IDCT (Discrete Cosine Transform, Inverse Discrete Cosine Transform) are presented. These architectures were developed for the FPGA implementation. First, algorithms for the efficient 2D DCT/IDCT calculation are presented. Using these algorithms micro-architectures for the efficient FPGA implementation are...
This paper presents a design of soft-core for industry standard 8051 microcontroller intended for FPGA implementation. It can be used in a wide range of applications such as embedded microcontroller systems, data computation and transfer, communication systems and professional audio and video. This core will be used as a part of video transmission...
In this paper is presented a typical procedure for designing and verification of complex hardware systems using languages for hardware description. VHDL is used as one of two exciting standards. All steps are illustrated on the designing of the DLX CPU. The DLX is a RISC designed for teaching principles of computer architecture. As an implementatio...