About
42
Publications
10,532
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
440
Citations
Introduction
Additional affiliations
September 1983 - present
Publications
Publications (42)
Median filtering is a widely used non-linear noise-filtering algorithm, which can efficiently remove salt and pepper noise while it preserves the edges of the objects. Unlike linear filters, which use multiply-and-accumulate operation, median filter sorts the input elements and selects the median of them. This makes it computationally more intensiv...
Multicore architectures enable increasing the performance of the system with parallel processing. One of the challenges of a multicore embedded system is the correct usage of the processor cores. It is possible to achieve balanced processor load on the different cores, but the communication bandwidth between the cores is often a bottleneck. Passing...
A rank order filter and instantiation thereof in programmable logic is described. A maximum filter core frequency is determined for an input sampling frequency, a filter window height, and a number of input samples. The maximum filter core frequency is greater than the sampling frequency. The maximum filter core frequency may be insufficient for a...
Molecular docking is an important problem of bioinformatics aiming at the prediction of binding poses of molecules. Auto Dock is a popular, open-source docking software applying a computationally expensive but parallelizable algorithm. This paper introduces an FPGA-based and a GPU-based implementation of Auto Dock and shows how the original algorit...
FPGA based hardware accelerators have been more and more widely used in different kind of applications. As compared to other solutions and the direct hardware implementation, the advantage of the FPGA devices is their flexibility that arises from their programmable nature. In addition to this, some FPGA devices also support partial dynamic reconfig...
AutoDock is a popular software for the bioinformatics related molecular docking problem. The FPGA-based acceleration of AutoDock is presented and evaluated in this paper. The implementation applies pipelines and fine-grained parallelization, and achieves a ×10-40 speedup over a 3.2 GHz CPU. Test runs show that the overall accuracy of the algorithm...
BLAST, the most widely used bioinformatics search tool, is routinely used for tasks infeasible to run on commodity computer systems. Prefiltering appears to be a promising acceleration approach that conserves the behavior of the original program yet provides significant search speedup. We conceived and implemented one such prefilter system based on...
The LOGSYS Development Environment is used as a versatile tool in different levels of the education of the B.Sc. and M.Sc. Embedded System courses. The motivation at the introduction was to provide an affordable platform to every student, which offers compatibility with the existing industry standard solutions, while can support practical design wo...
This paper presents an FPGA implementation of a high performance rank filter for video and image processing. The architecture exploits the features of current FPGAs and offers tradeoff between complexity and clock speed. By maximizing the operating frequency the complexity of the filter structure can be considerably reduced compared to previous 2D...
Digital design courses are one of the basic subjects in both of the Electrical Engineering and Technical Informatics branches of BUTE. Better support for the practical experiments would help the students deeper understanding the topics and would provide good background of later more professional courses. Programmable logic devices, like FPGAs, offe...
Complex three dimensional graphics rendering is computationally very intensive process, so even the newest microprocessors cannot handle more complicated scenes in real time. Therefore to produce realistic rendering, hardware solutions are required. This paper discusses an FPGA implementation which supports programmable pixel computing.
This paper presents an FPGA implementation of a high-performance rank filter for video and image processing. The architecture exploits the features of current FPGAs and offers tradeoffs between complexity and performance. By maximizing the operating frequency, the complexity of the filter structure can be considerably reduced compared to previous 2...
Distributed wireless sensor networks consisting of several single sensors are becoming very popular in many important applications. The widespread proliferation of low-cost sensor nodes is plagued by several technical challenges namely resource (e.g. bandwidth and battery power) constraints, reliability and health of sensors, and more importantly c...
This paper proposes a wireless sensor network based acoustics source localization and tracking system. Each individual node has a special purpose sensor board with four acoustic channels and a digital compass enabling direction of arrival (DOA) estimation of acoustic sources. Upon detecting a source of interest, the sparsely deployed sensor nodes r...
There are two factors determining the performance a 3D accelerator can achieve: the available computational power and the available memory bandwidth. In embedded systems, these resources are even more limited then in desktop environments, thus the efficiency of the hardware architecture and the exploitation of the logic resources become even more i...
The probability of faults occurring in the field increases with the evolution of the CMOS technologies. It becomes, therefore, increasingly important to analyze the potential consequences of such faults on the applications. Fault injection techniques have been used for years to validate the dependability level of circuits and systems, and approache...
Complex three dimensional graphics rendering is computationally very intensive process, so even the newest microprocessors cannot handle more complicated scenes in real time. Therefore to produce realistic rendering, hardware solutions are required. This paper discusses an FPGA implementation which complies with the newer, programmable standards. A...
The probability of faults, and especially transient faults, occurring in the field is increasing with the evolutions of the CMOS technologies. It becomes therefore crucial to predict the potential consequences of such faults on the applications. Fault injection techniques based on the high level descriptions of the circuits have been proposed for a...
In many applications the most significant advantages of neural networks come mainly from their parallel architectures ensuring rather high operation speed. The difficulties of parallel digital hardware implementation arise mostly from the high complexity of the parallel many-multiplier structure. This paper suggests a new bit-serial/parallel neural...
In this paper, approaches using run-time reconfiguration (RTR) for fault injection in programmable systems are introduced. In FPGA-based systems an important characteristic is the time to reconfigure the hardware. With novel FPGA families (e.g. Virtex, AT6000) it is possible to reconfigure the hardware partially in run-time. Important time-savings...
Direct hardware realizations of digital filters on FPGA devices require efficient implementation of the multiplier modules.
The distributed arithmetic form of the inner product processor array offers the possibility of merging the individual partial
products, which leads to reduced logic complexity. Although this possibility can be exploited mainly...
Deals with the direct hardware implementation of trained neural
networks and suggests a matrix-vector multiplier synthesis method which
makes possible very efficient hardware realization. The full parallel,
bit-serial architecture can be efficiently used for FPGA and ASIC
implementations. The new neural network realization approach can be
integrate...
The architecture of ACE, a multiprocessor analogic cellular neural
network (CNN) emulator engine consisting of 2 to 16 TMS320C40 floating
point DSPs is introduced. The engine containing up to 512 Mbyte RAM
(enough to store a 512×512×512 sized CNN cube) which can be
controlled through its SCSI port. It can either accelerate the
multilayer CNN simula...
The 2D discrete cosine transform (2D DCT) is one of the most effective methods in image data compression. In this paper an inner produce algorithm for the 8 X 8 2D DCT implementation is presented. The proposed direct 2D inner product algorithm exploits redundancies down to the bit level, and results in minimal hardware complexity. The basic algorit...
The 2D discrete cosine transform (2D DCT) is one of the most effective methods in image data compression. In this paper an inner produce algorithm for the 8 X 8 2D DCT implementation is presented. The proposed direct 2D inner product algorithm exploits redundancies down to the bit level, and results in minimal hardware complexity. The basic algorit...
This paper suggest aa new coefficient-dependent multiplication scheme for Field-Programmable Gate-Array (FPGA) implementation of direct-from Finite Impulse Response (FIR) digital filters. Using conventional design methods, FPGA devices are not very efficient in digital FIR filter implementation due to the plexity of the multipliers. Analysis of fil...
Direct hardware implementation of large inner product operations are always difficult because of the complexity of the multiplier modules. The paper suggests a new multiplier synthesis method for this type of arithmetic operation. The fully concurrent, bit serial vector multiplication architecture is intended for FPGA or ASIC implementation. The bi...
A resonator based digital filter (RBDF) implementation is
presented using field programmable gate array (FPGA) elements. The
realization is based on the common structure developed for recursive
discrete transforms. The globally parallel property of the recursive
structure is maintained, while the arithmetic operations are realized by
bit-serial log...
The applicability of the recursive Walsh-Hadamard transformation
to FIR and IIR (finite- and infinite-impulse-response) filtering is
investigated. It is shown that using a common structure for recursive
transforms recently introduced by G. Peceli (1986), the usual
frequency-domain FIR filtering problem can be easily converted into a
Walsh sequency-...
The authors describe a multi-purpose arrangement enabling the recursive realization of any discrete transformation. Explicit expressions are given for the parameters and there is also a reference to the design of finite and infinite (FIR and IIR) impulse response filters using recursive transformations.
This paper presents an FIR filter architecture suitable for embedded FPGA based applications. With the limited resource requirements and its high performance, the architecture is suitable to implement real time, multi-channel filtering structures even in smaller FPGAs.
Modeling the light-surface interaction in real time 3D applications becomes more and more complex, as users require more lifelike images. Segmented screen rendering offers a viable solution to minimize the unnecessary work done in traditional rendering architectures. However, increasing the efficiency of the rendering pipeline also increases the re...
Segmented Screen Rendering (or Bucket Rendering) technique can considerably improve performance and/or lower external memory bandwidth requirements by segmenting the screen into small rectangles, and rendering these rectangles independently. Since the size of the buffers for a segment is significantly reduced compared to the full-screen buffers, it...
Embedded software design methodology is merging from various hardware dependent assembly languages towards the well-defined and commonly used C. Most algorithms are evaluated with standard PCs using high-level programming languages. There is a natural need to use this code in the design too. During the implementation phase hardware and software com...
This paper presents an evaluated configuration with three microprocessors inside a single FPGA. The main processor is running an embedded operating system enabling to implement sophisticated software algorithms, meanwhile the other two 32 bit microprocessor runs separate, dedicated hardware related real-time tasks. Abstract Příspěvek popisuje zhodn...
This paper presents an FPGA implementation of a two-dimensional median filter architecture for image and video processing applications. The architecture exploits sorting based on partial rather than complete per-pixel information. This allows performance enhancement, which is a key point in image filtering, as the sampling frequency is typically re...
The aim of virtual screening is to find compounds in libraries which exhibit the required properties. These properties are represented in fingerprints; during the screening process the descriptors of the compounds are compared to each other and a dissimilarity score is calculated. As compound libraries typically contain a large amount of fingerprin...
Thanks to the rapid development of electronics, systems built with programmable logic devices (FPGAs and CPLDs) and microcontrollers are more and more widely used. Their great advantage that arises from their programmable nature as compared to systems using application specific integrated circuits is their flexibility, which cuts back the cost of t...