Conference Paper

A run-time reconfigurable system for gene-sequence searching

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Advances in the field of bio-technology has led to an ever increasing demand for computational resources to rapidly search large databases of genetic information. Databases with billions of data elements are routinely compared and searched for matching and near-matching patterns. We present a system developed to search DNA sequence data using run-time reconfiguration of field programmable gate arrays (FPGAs). The system provides an order of magnitude increase in performance while reducing hardware complexity when compared to existing commercial systems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Using Eq.(5), it can be seen that modulo-4 encoding can be used to represent each value. Furthermore, the least significant bit of a and d are always equal, and hence, d can be represented by only the most significant bit [9]. In genomic databases, four character alphabets are used to Fig.3 can compute Eq.(5). ...
... The design is based on some optimizations for run-time customization [9], i.e. the length of queries and the initialization value of shift registers will be changed at run time. Thus, it occupies only 4 Xilinx SpartanIII slices per PE. ...
... The number of hardware unit for a character of query sentence is 4/16, less than much in compared with others. The throughputs of [5,9] are noticeable since the FPGA sizes are large to accommodate a lot of PEs. If our design were implemented on a larger FPGA chip, the performance would be nearly similar or even better. ...
Article
Full-text available
In this paper, we present a processor array for the similarity search of many DNA sequences against a large database. Based on a recently proposed systolic mapping, it is im-plemented using a low-cost 400k-gate Xilinx Spartan III XC3S400 FPGA. When comparing to software implemen-tation, the array can achieve approximately hundreds times increasing in performance while saving time spent on trans-ferring the database which is the bottleneck of the other hardware systems up to sixteen folds.
... In order to align a 2 kbp sequence with a 8 kbp database, a speedup of 102 was obtained, when compared with the software implementation. Puttegowda et al. [21] proposed a systolic array architecture to run the SW algorithm at the OSIRIS board. This board contains 2 FPGAs: one interfaces with the host machine whereas the other to executes user-programmed hardware designs. ...
... In this case, SW either stands for Needleman-Wunsh (Section 2.1), Smith-Waterman (Section 2.2), or edit distance computation, all of which are similar algorithms. Most of the approaches analyzed [19], [20], [21], [22], [24], [27], [29] execute SW. Some of them [23], [25] execute either SW or Gotoh (Section 2.2) whereas [26] and [30] implement only Gotoh. ...
... Since the space complexity of SW is Oðn 2 Þ, only small sequences can be aligned. In Puttegowda et al. [21], only the similarity matrix computation was made, without storing the matrix elements neither obtaining the highest score. Most of the proposals [19], [20], [23], [24], [25], [26], [27], [29] use some kind of partitioning technique (column 5), in order to deal with sequences of any size. ...
Article
Full-text available
The recent and astonishing accomplishments in the field of Genomics would not have been possible without the techniques, algorithms, and tools developed in Bioinformatics. Biological sequence comparison is an important operation in Bioinformatics because it is used to determine how similar two sequences are. As a result of this operation, one or more alignments are produced. DIALIGN is an exact algorithm that uses dynamic programming to obtain optimal biological sequence alignments in quadratic space and time. One effective way to accelerate DIALIGN is to design FPGA-based architectures to execute it. Nevertheless, the complete retrieval of an alignment in hardware requires modifications on the original algorithm because it executes in quadratic space. In this paper, we propose and evaluate two FPGA-based accelerators executing DIALIGN in linear space: one to obtain the optimal DIALIGN score (DIALIGN-Score) and one to retrieve the DIALIGN alignment (DIALIGN-Alignment). Because it appears to be no documented variant of the DIALIGN algorithm that produces alignments in linear space, we here propose a linear space variant of the DIALIGN algorithm and have designed the DIALIGN-Alignment accelerator to implement it. The experimental results show that impressive speedups can be obtained with both accelerators when comparing long biological sequences: the DIALIGN-Score accelerator achieved a speedup of 383.4 and the DIALIGN-Alignment accelerator reached a speedup of 141.38.
... Several reconfigurable logic-based solutions for various bioinformatics algorithms and applications came to light in recent years. Bioinformatics algorithms that solve the DNA sequence matching problem such as Smith Waterman [1, 2] and BLAST [3, 4, 5] have frequently been mapped to FP- GAs in the past. From a computer architecture point of view, these problems deal with data streaming as well as character matching issues and exhibit similar characteristics as applications from other domains, such as network processor and intrusion detection systems [6, 7]. ...
... Once the likelihood vector of the virtual root has been computed, the Basic Cells are used again to calculate the per-column likelihood scores (see Equation 2) and the product over the per-column scores l(c) is then computed by the Likelihood Score Unit. X1[1] X1[2] X1[3] P[i] P[i+1] P[i+2] P[i+3] P[i] P[i+1] P[i+2] P[i+3] X2[0] X2[1] X2[2] X2[3] sel_i_index mode Lx ...
... Once the likelihood vector of the virtual root has been computed, the Basic Cells are used again to calculate the per-column likelihood scores (see Equation 2) and the product over the per-column scores l(c) is then computed by the Likelihood Score Unit. X1[1] X1[2] X1[3] P[i] P[i+1] P[i+2] P[i+3] P[i] P[i+1] P[i+2] P[i+3] X2[0] X2[1] X2[2] X2[3] sel_i_index mode Lx ...
Conference Paper
Full-text available
As FPGA devices become larger, more coarse-grain modules coupled with large scale reconfigurable fabric become available, thus enabling new classes of applications to run efficiently, as compared to a general-purpose computer. This paper presents an architecture that benefits from the large number of DSP modules in Xilinx technology to implement massive floating point arithmetic. Our architecture computes the Phylogenetic Likelihood Function (PLF) which accounts for approximately 95% of total execution time in all state-of-the-art Maximum Likelihood (ML) based programs for reconstruction of evolutionary relationships. We validate and assess performance of our architecture against a highly optimized and parallelized software implementation of the PLF that is based on RAxML, which is considered to be one of the fastest and most accurate programs for phylogenetic inference. Both software and hardware implementations use double precision floating point arithmetic. The new architecture achieves speedups ranging from 1.6 up to 7.2 compared to a high-end 8-way dual-core general-purpose computer running the aforementioned highly optimized OpenMP-based multi-threaded version of the PLF.
... While the success of FPGAs as custom computing machines (CCMs) may be largely attributed to economic factors – and even by itself this would not be a trivial contribution – reconfigurability does offer some distinct advantages over traditional ASIC solutions. With some applications, significant performance gains can be achieved by reconfiguration based on data that is only available at run-time [5][6][7]. In addition, it would be possible to swap hardware blocks in and out of an FPGA in a sense similar to context-switching [8]. ...
... Furthermore, the systolic array architecture suits this kind of computation. Some implementations, such as [6] and [7] , utilize runtime-configuration to generate more efficient hardware. Some constants that are determined at run-time (i.e. ...
... Specifically, in terms of lookup-tables (LUTs) and flip-flop pairs, savings of approximately a factor of 5.5 have been attributed to run-time reconfiguration [7]. One of the most successful implementations [6] utilizes an FPGA board with a Xilinx XC2V6000-4 device and multiple memories attached. Ten SRAM chips are attached, in addition to 512MB PC133 SDRAM (accessible by the host computer ), as well as some RAM that is dedicated for run-timeconfiguration data. ...
Conference Paper
Molecular biocomputation workflows traditionally involve days of compute time to align DNA/protein sequences. Custom computing machines (CCMs) provide a means to dramatically reduce alignment time, and FPGAs provide a practical means to implement such CCMs. Software implementation of some sequence alignment algorithms suffer quadratic time performance, however CCM implementations may be highly parallelized and consequently provide linear time performance. Similarly, CCMs may be used to accelerate workflows or operations in a wide range of domains, often dramatically outperforming large scale clusters. Programming and integration problems limit CCM usage, though progress has been made to overcome these problems. With continued development of tools, devices, and integration solutions, CCMs on FPGAs coupled to conventional systems present an effective architecture for high performance computing.
... 65 con los procesadores de propósito general. Estos trabajos abarcan campos como: análisis de secuencias genéticas (Puttegowda et ál., 2003; Bogdán et ál., 2008; Hoang y Lopresti, 1993), filtrado digital (Tessier y Burleson, 2001; Yamada y Nishihara 2001; Wang y Shen, 2008), criptografía (Patterson, 2000; Anghelescu et ál., 2008a; Anghelescu et ál., 2008), filtrado de paquetes de red (Sinnappan y Hazelhurst, 2001; Cho y Mangione-Smith, 2008, 2005), reconocimiento automático de objetos (Jean et ál., 1999; Villasenor et ál., 1996), identificación de patrones (Baker y Prasanna, 2004), entre otros. Los resultados de estas investigaciones muestran claramente que cuando se cumplen las condiciones especificadas en la sección " Procesamiento específico usando FPGAs " el procesador específico siempre será superior al de propósito general. ...
... Esto es debido a que la implementación de esas operaciones dentro del FPGA, no consumen muchos recursos lógicos (CLB), por lo tanto (dentro del FPGA), se podían replicar muchos módulos. Como resultado de la replicación interna de módulos y de una frecuencia de trabajo (interna ) elevada, estos primeros procesadores lograron índices de aceleración muy buenos, si se compara su rendimiento con un procesador de propósito general (Puttegowda et ál., 2003; Tessier y Burleson, 2001; Patterson, 2000; Sinnappan y Hazelhurst, 2001). ...
Article
Full-text available
This paper was aimed at describing the state of the art regarding 2D migration from a software and hardware perspective. It also gives the current state of specific processing using field programmable gate array (FPGA) and then concludes with the feasibility of fully implementing 2D seismic migration on a FPGA via a specific processor. Work was used showing performance in different areas of knowledge to gain an overview of the current state of specific processing using FPGAs. As 2D seismic migration employs floating-point data, this article thus compiles several papers showing trends in floating-point operations in both general and specific processors. The information presented in this article led to concluding that FPGAs have a promising future in this area due to oil industry companies having begun to develop their own tools aimed at further optimising field exploration.
... 65 con los procesadores de propósito general. Estos trabajos abarcan campos como: análisis de secuencias genéticas (Puttegowda et ál., 2003; Bogdán et ál., 2008; Hoang y Lopresti, 1993), filtrado digital (Tessier y Burleson, 2001; Yamada y Nishihara 2001; Wang y Shen, 2008), criptografía (Patterson, 2000; Anghelescu et ál., 2008a; Anghelescu et ál., 2008), filtrado de paquetes de red (Sinnappan y Hazelhurst, 2001; Cho y Mangione-Smith, 2008, 2005), reconocimiento automático de objetos (Jean et ál., 1999; Villasenor et ál., 1996), identificación de patrones (Baker y Prasanna, 2004), entre otros. Los resultados de estas investigaciones muestran claramente que cuando se cumplen las condiciones especificadas en la sección " Procesamiento específico usando FPGAs " el procesador específico siempre será superior al de propósito general. ...
... Esto es debido a que la implementación de esas operaciones dentro del FPGA, no consumen muchos recursos lógicos (CLB), por lo tanto (dentro del FPGA), se podían replicar muchos módulos. Como resultado de la replicación interna de módulos y de una frecuencia de trabajo (interna ) elevada, estos primeros procesadores lograron índices de aceleración muy buenos, si se compara su rendimiento con un procesador de propósito general (Puttegowda et ál., 2003; Tessier y Burleson, 2001; Patterson, 2000; Sinnappan y Hazelhurst, 2001). ...
Article
Full-text available
This paper was aimed at describing the state of the art regarding 2D migration from a software and hardware perspective. It also gives the current state of specific processing using field programmable gate array (FPGA) and then concludes with the feasibility of fully implementing 2D seismic migration on a FPGA via a specific processor. Work was used showing performance in different areas of knowledge to gain an overview of the current state of specific processing using FPGAs. As 2D seismic migration employs floating-point data, this article thus compiles several papers showing trends in floating-point operations in both general and specific processors. The information presented in this article led to concluding that FPGAs have a promising future in this area due to oil industry companies having begun to develop their own tools aimed at further optimising field exploration.
... Many previous ASIC and FPGA implementations of the Smith-Waterman algorithm have been proposed and some are reviewed in Section 4. To date, the highest performance chip [6] and system level [7] performance figures have been achieved using a runtime reconfigurable implementation which directly writes one of the strings into the FPGA's bitstream. ...
... A number of commercial and research implementations of the Smith Waterman algorithm have been reported and their performance are summarized in Table 1. Examples are Splash [11], Splash 2 [12], SAMBA [13], Paracel [14], Celera [15], JBits from Xilinx [6], and the HokieGene Bioinformatics Project [7]. The performance measure of cell updates per second (CUPS) is widely used in the literature and hence adopted for our results. ...
Chapter
Full-text available
In order to understand the information encoded by DNA sequences, databases containing large amount of DNA sequence information are frequently compared and searched for matching or near-matching patterns. This kind of similarity calculation is known as sequence alignment. To date, the most popular algorithms for this operation are heuristic approaches such as BLAST and FASTA which give high speed but low sensitivity, i.e. significant matches may be missed by the searches. Another algorithm, the Smith-Waterman algorithm, is a more computationally expensive algorithm but achieves higher sensitivity. In this paper, an improved systolic processing element cell for implementing the Smith-Waterman on a Xilinx Virtex FPGA is presented.
... The Blast section implemented in [19] had worse time than a software implementation. The time required to transfer the data to the FPGA was already more than the time to execute the entire algorithm in software showing that this algorithm is difficult to implement taking full advantage of the parallel nature of FPGA [27]]) that calculate the similarity matrix antidiagonals in parallel, taking full advantage of the wavefront method ( figure 3). This approach allows using the parallel potential of FPGA circuit for calculating many matrix cells at the same time. ...
... This approach is very powerful because an FPGA can calculate billions cells per second for a plain Smith Waterman [13], [20], [27]]. However, the quadratic space complexity of the SW algorithm is a great restriction. ...
... Many previous ASIC and FPGA implementations of the Smith-Waterman algorithm have been proposed and some are reviewed in Section 4. To date, the highest performance chip [6] and system level [7] performance figures have been achieved using a runtime reconfigurable implementation which directly writes one of the strings into the FPGA's bitstream. ...
... A number of commercial and research implementations of the Smith Waterman algorithm have been reported and their performance are summarized in Table 1. Examples are Splash [11], Splash 2 [12], SAMBA [13], Paracel [14], Celera [15], JBits from Xilinx [6], and the HokieGene Bioinformatics Project [7]. The performance measure of cell updates per second (CUPS) is widely used in the literature and hence adopted for our results. ...
Conference Paper
Full-text available
With an aim to understand the information encoded by DNA sequences, databases containing large amount of DNA sequence informa-tion are frequently compared and searched for matching or near-matching patterns. This kind of similarity calculation is known as sequence align-ment. To date, the most popular algorithms for this operation are heuris-tic approaches such as BLAST and FASTA which give high speed but low sensitivity, i.e. significant matches may be missed by the searches. Another algorithm, the Smith-Waterman algorithm, is a more compu-tationally expensive algorithm but achieves higher sensitivity. In this paper, an improved systolic processing element cell for implementing the Smith-Waterman on a Xilinx Virtex FPGA is presented.
... The first works were limited to the calculation of the SW or NW algorithm with the linear gap penalty model, [63][64][65] however, designs with the affine gap penalty model were quickly reported, [66][67][68][69] which adapts better to the evolutionary process of the species. Additionally, more flexible designs have been developed, which perform alignments with different input parameters and even allow both global and local alignments. ...
Article
Full-text available
The alignment or mapping of Deoxyribonucleic Acid (DNA) reads produced by the new massively parallel sequencing machines is a fundamental initial step in the DNA analysis process. DNA alignment consists of ordering millions of short nucleotide sequences called reads, using a previously sequenced genome as a reference, to reconstruct the genetic code of a species. Even with the efforts made in the development of new multi-stage alignment programs, based on sophisticated algorithms and new filtering heuristics, the execution times remain limiting for the development of various applications such as epigenetics and genomic medicine. This paper presents an overview of recent developments in the acceleration of DNA alignment programs, with special emphasis on those based on hardware, in particular Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and Processing-in-Memory (PIM) devices. Unlike most of the works found in the literature, which review only the proposals that gradually emerged in some specific acceleration technology, this work analyzes the contemporary state of the subject in a more comprehensive way, covering from the conception of the problem, the modern sequencing technologies and the analysis of the structure of the new alignment programs, to the most innovative software and hardware acceleration techniques. The foregoing allows to clearly define, at the end of the paper, the trends, challenges and opportunities that still prevail in the field. We hope that this work will serve as a guide for the development of new and more sophisticated DNA alignment systems.
... Las FPGA ya han demostrado que tienen el potencial para acelerar aplicaciones de cómputo intensivo, prueba de esto es la constante publicación de procesos computacionales implementados sobre FPGA; estas publicaciones cubren áreas como: redes neuronales [23], [24], [25], [26], búsqueda de secuencias genéticas [27], [28], filtrado digital [29], [30], [31], [32], filtrado de paquetes de red [33], [34] y simulaciones financieras Monte-Carlo [35], [36] entre otras. ...
Article
Full-text available
This article makes a review around the efforts that are currently being carried out in order to reduce the computation time of the MS. We introduce the methods used to make the migration process as well as the two computer architectures that are offering better processing times. We review the most representative implementations of this process on these two technologies and summarize the contributions of each of these investigations. The article ends with our analisys about the direction that future research should take in this area. PACS: 93.85.Rt MSC: 68M20
... This paper by Puttegowda et al. [15] from the Virginia Bioinformatics Institute describes their effort to implement the Smith-Waterman Algorithm using FPGAs. The name of their system is Hokiegene and it uses an FPGA-PCI board called the Osiris in the implementation. ...
Article
FLEET is an experimental architecture proposed by researchers at SUN Microsystems and UC Berkeley. Reflecting the fact that the wiring dominates the power, delay, and area costs of most CPU designs, the FLEET architecture focuses on communication. A FLEET processor may have a large number of functional units operating in parallel. Programming essentially becomes deciding how data should be moved between these functional units. The Berkeley/SUN research has focused on hardware design issues and existing software examples for FLEET are very simple. In this research, we used the Smith-Waterman algorithm for string comparison as an example of a larger scale application of FLEET, as this algorithm can be accelerated through the use of computer architectures with parallel execution capability. Smith-Waterman is a dynamic programming algorithm that has applications in biology for comparing DNA and protein sequences. We attempted to evaluate the FLEET architecture’s potential for exploiting parallelism by implementing the Smith-Waterman algorithm on a simulator for FLEET. While we were not able to complete the implementation, our work revealed some shortcomings of the current simulator and programming model. On the other hand, we believe that the architecture has merit for high-performance computing, and our experience suggests ways that the tools for FLEET could be improved to facilitate the development of FLEET software. ii
... Because of its general applicability and the RT-level design, our technique makes designing DDF systems feasible for many applications. Other applications that may benefit from our DDF technique are: encryption algorithms like AES and DES, template matching [32], regular expression matching [33], DNA aligning [34], [35], serial fault emulation [36] and many others. ...
Article
Adaptive embedded systems are currently inves-tigated as an answer to more stringent requirements on low power, in combination with significant performance. It is clear that runtime adaptation can offer benefits to embedded systems over static implementations as the ar-chitecture itself can be tuned to the problem at hand. Such architecture specialisation should be done fast enough so that the overhead of adapting the system does not overshadow the benefits obtained by the adaptivity. In this paper, we propose a methodology for FPGA design that allows such a fast reconfiguration for dynamic datafolding applications. Dynamic Data Folding (DDF) is a technique to dynamically specialize an FPGA configuration according to the values of a set of parameters. The general idea of DDF is that each time the parameter values change, the device is reconfigured with a configuration that is specialized for the new parameter values. Since specialized configurations are smaller and faster than their generic counterpart, the hope is that their corresponding system implementation will be more cost efficient. In this paper, we show that DDF can be implemented on current commercial FPGAs by using the parameterizable run-time reconfiguration methodology. This methodology comprises a tool flow that automatically transforms DDF applications to a runtime adaptive imple-mentation. Experimental results with this tool flow show that we can reap the benefits (smaller area and faster clocks) without too much reconfiguration overhead.
... Las FPGA ya han demostrado que tienen el potencial para acelerar aplicaciones de cómputo intensivo, prueba de esto es la constante publicación de procesos computacionales implementados sobre FPGA; estas publicaciones cubren áreas como: redes neuronales [23], [24], [25], [26], búsqueda de secuencias genéticas [27], [28], filtrado digital [29], [30], [31], [32], filtrado de paquetes de red [33], [34] y simulaciones financieras Monte-Carlo [35], [36] entre otras. ...
Article
Full-text available
This article makes a review around the efforts that are currently being carried out in order to reduce the computation time of the MS. We introduce the methods used to make the migration process as well as the two computer architectures that are offering better processing times. We review the most representative implementations of this process on these two technologies and summarize the contributions of each of these investigations. The article ends with our analisys about the direction that future research should take in this area.
... However, there exists some research which focuses on improving performance for a single large sequence alignment problem [25]. Some FPGA and reconfigurable hardware research leverage the strong dataflow properties within the algorithm in order to exploit parallelism [26], [27], [28]. While much of this research focuses on improving throughput performance, some work also emphasizes reducing logic footprint [29]. ...
... Applications of FPGAs in industry cover a broad range of areas including Digital Signal Processing (DSP), aerospace and defense systems, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography, bioinformatics, 8 computer hardware emulation and a growing range of other areas [9][10][11] . FP-GAs especially find application where algorithms can make use of the massive parallelism offered by their architecture. ...
Conference Paper
Full-text available
The long computation times required to simulate complete aircraft configurations re-main as the main bottleneck in the design flow of new structures for the aeronautics in-dustry. In this paper, the novel application of specific hardware (FPGAs) in conjunction with conventional processors to accelerate CFD is explored in detail. First, some general facts about application-specific hardware are presented, placing the focus on the feasibil-ity of the development of hardware modules (FPGAs based) for the acceleration of most time-consuming algorithms in aeronautics analysis. Then, a practical methodology for de-veloping an FPGA-based computing solution for the quasi 1D Euler equations is applied to the Sod's 'Shock Tube' problem. Results comparing CPU-based and FPGA-based solutions are presented, showing that speedups around two orders of magnitude can be expected from the FPGA-based implementation. Finally, some conclusions about this novel approach are drawn.
... FPGAs are used to accelerate many applications in various areas because of the parallel processing features, low energy consumption and reconfigurability. In the bioinformatic application, Puttegowda et al. [15] has reported that using FPGA can obtain an order of magnitude speedup of the Smith-Waterman algorithm for DNA sequence alignment. Hussain et al. [16] has accelerated k-mean algorithm for clustering microarray data using FPGA by 51.7 times compared with Matlab implementation on PC. ...
Article
Full-text available
Biclustering is an important technique in data mining for searching similar patterns. Geometric biclustering (GBC) method is used to reduce the complexity of the NP-complete biclustering algorithm. This paper studies three commonly used modern platforms including multi-core CPU, GPU and FPGA to accelerate this GBC algorithm. By analyzing the parallelizing property of the GBC algorithm, we design 1) a multi-threaded software running on a server grade multi-core CPU system, 2) a CUDA program for GPU to accelerate the GBC algorithm, and 3) a novel parameterizable and scalable hardware architecture implemented on an FPGA. Genes microarray pattern analysis is employed as an example to demonstrate performance comparisons on different platforms. In particular, we compare the speed and energy efficiency of the three proposed methods. We found that 1) GPU achieves the highest average speedup of 48 × compared to single-threaded GBC program, 2) Our FPGA design can achieve higher speedup of 4 × for the computation for large microarray, and 3) FPGA consumes the least energy, which is about 3.53 × more efficient than the single-threaded GBC program.
... In the case of branch optimization, a frequently executed branch case is optimized based on typical execution profiles. Specific application accelerators that use these techniques include SAT solvers [164], sequence alignment [121] and Viterbi decoding [143]. Our work is intended to be generally applicable to any application specified as a recurrence. ...
Article
The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays (FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms.
... Using this software, for example, one can build a hidden Markov model profile of the thrA gene in the Escherichia coli strains E. coli K12 and E. coli O157:H7 EDL933 [21], and then use this to find and align the same gene in the CFT073 strain [22]. Searching and aligning can be done with other algorithms, such as the SmithWaterman algorithm [11], [23]. Here we show the match of part of the gene sequence found in CFT073. ...
Article
This paper reviews applications of signal processing techniques to a number of areas in the field of genetics. We focus on techniques for analyzing DNA sequences, and briefly discuss applications of signal processing to DNA sequencing that determines the sequences, and other related areas in genetics that can provide biologically significant information to assist with sequence analysis.
... Applications well suited to acceleration by FPGAs typically exhibit massive parallelism and small integer or fixed-point data types. Significant performance gains have been described for gene sequencing [2] [3], digital filtering [4], cryptography [5], network packet filtering [6], target recognition [7], and pattern matching [8]. These successes have led SRC Computers [9], DRC Computer Corp. [10], Cray [11], Starbridge Systems [12], and SGI [13] to offer clusters featuring programmable logic. ...
... al. [4] implement the Smith Waterman matching algorithm. The same algorithm was implemented at Virginia Tech [17] and the most recent implementation was at Nanyang Techological University [14]. Mapping dynamic programming algorithms on FPGAs seems to be suitable for the capabilities of FPGAs especially when systolic array architectures are used. ...
Article
Full-text available
DNA sequence comparison is a computationally intensive problem, known widely since the competition for human DNA decryption. Database search for DNA sequence comparison is of great value to computational biologists. Several algorithms have been developed and implemented to solve this problem efficiently, but from a user base point of view the BLAST algorithm is the most widely used one. In this paper, we present a new architecture for the BLAST algorithm. The new architecture was fully designed, placed and routed. The post place-and-route cycle-accurate simulation, accounting for the I/O, shows a better performance than a cluster of workstations running highly optimized code over identical datasets. The new architecture and detailed performance results are presented in this paper.
... The initial work on FPGA is on Splash [10] as it was used to compare the new DNA sequence against a database. Most of the implementation of DNA sequence alignment accelerators combine the parallel capability of Smith Waterman [6,7] and systolic array processor architecture such as in [11][12][13][14][15].Most of the results show great improvement of the computational processing time. There are some novel ideas from the FPGA implementations that can be beneficial to improve future designs. ...
Article
Full-text available
This paper present the design and analysis of 8-bit Smith Waterman (SW) based DNA sequence alignment accelerator's core on ASIC design flow. The objective of the project is to construct and analyse the core module that can perform the Smith Waterman algorithm's operations, which are comparing, scoring and back tracing, using the technique used in (1,2) on ASIC design flow. Nowadays, the DNA and protein databases are increasing rapidly and these add new challenges to the current computing resources. New techniques, algorithms, designs, hardware and software that can maximize the computational speed, minimize the power and energy consumption, and boost the throughput need to be developed in order to meet the current and future requirements. In DNA sequence alignment process, the DNA sequences are compared using different alignment requirement techniques such as global alignment, local alignment, motif alignment and multiple sequence alignment. Moreover, there are several algorithms used to perform the sequence alignment process such as Needleman- Wunch algorithm, Smith Waterman algorithm, FASTA, BLAST and so on. For this paper, the focus is on local alignment using Smith Waterman algorithm. The design was modelled using Verilog and the functionality was verified using Xilinx and VCS. The RTL codes was mapped and synthesized to technology based logics using Design Compiler (DC). The core's layout was implemented using Place and Route tool, IC Compiler (ICC). Based on the results, the core design area was 2108.937620 um 2 .The maximum time constraints were 6.85 ns and 6.93 ns in ICC and PT. The minimum time constraints were 0.28 ns and 0.30 ns in ICC and PT respectively. In conclusion, the design had been successfully implemented on ASIC design flow. Moreover, the results showed that the design can be further optimized to work at faster speeds.
... Using this software, for example, one can build a hidden Markov model profile of the thrA gene in the Escherichia coli strains E. coli K12 and E. coli O157:H7 EDL933 [33], and then use this to find and align the same gene in the CFT073 strain [34]. Searching and aligning can be done with other algorithms, such as the Smith-Waterman algorithm [13,35]. Both HMMER and Smith-Waterman have a time complexity of O ln 2 and space complexity of O ln 2 in aligning l sequences of length n; the HMMER algorithm does more general sequence searching than dynamic programming algorithms such as the Smith-Waterman algorithm, identifying more loosely related sequences. ...
Article
This paper reviews applications of signal processing techniques to a number of areas in the field of genetics. We focus on techniques for analyzing DNA sequences, and briefly discuss applications of signal processing to DNA sequencing, and other related areas in genetics that can provide biologically significant information to assist with sequence analysis.
... Additionally, a new alternative to boost the performance is to consider new heterogeneous architectures for high performance computing [11][12][13] combining conventional processors with specific hardware to accelerate the most time consuming functions. Hardware acceleration gives best results, in terms of overall acceleration and value for money when applied to problems in which: ...
Conference Paper
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented.
... The FPGA implementations presented in [13][14] [15] are restricted to DNA sequences, which are a special case of our implementation. The implementation reported in [14] achieves 1260 GCUPS peak performance on a Xilinx XC2V60004 FPGA part. The implementations reported in [13] and [15] achieve over 3200 GCUPS on the same part. ...
Article
This paper presents the design and implementation of the most parameterisable field-programmable gate array (FPGA)-based skeleton for pairwise biological sequence alignment reported in the literature. The skeleton is parameterised in terms of the sequence symbol type, i.e., DNA, RNA, or protein sequences, the sequence lengths, the match score, i.e., the score attributed to a symbol match, mismatch or gap, and the matching task, i.e., the algorithm used to match sequences, which includes global alignment, local alignment, and overlapped matching. Instances of the skeleton implement the Smith-Waterman and the Needleman-Wunsch algorithms. The skeleton has the advantage of being captured in the Handel-C language, which makes it FPGA platform-independent. Hence, the same code could be ported across a variety of FPGA families. It implements the sequence alignment algorithm in hand using a pipeline of basic processing elements, which are tailored to the algorithm parameters. This paper presents a number of optimizations built into the skeleton and applied at compile-time depending on the user-supplied parameters. These result in high performance FPGA implementations tailored to the algorithm in hand. For instance, actual hardware implementations of the Smith-Waterman algorithm for Protein sequence alignment achieve speedups of two orders of magnitude compared to equivalent standard desktop software implementations.
... XDL and the accompanying xdl tool in the Xilinx ISE design suite have been present for over 10 years [10]. Others have also realized the potential of XDL as several papers have been published that demonstrate its use in ideas such as a bus macro generator [11], C-slow re-timing [12], power estimation [13], floor planning tools [14], routing constraints [15][16] and runtime reconfiguration [17]. Despite these and several other efforts which leverage XDL, a unified open source solution to facilitate the use of XDL has never materialized until recently. ...
Conference Paper
Creating CAD tools for commercial FPGAs is a difficult task. Closed proprietary device databases and unsupported interfaces are largely to blame for the lack of CAD research found on commercial architectures versus hypothetical architectures. This paper formally introduces RapidSmith, a new set of tools and APIs that enable CAD tool creation for Xilinx FPGAs. Based on the Xilinx Design Language (XDL), RapidSmith provides a compact, yet, fast device database with hundreds of APIs that enable the creation of placers, routers and several other tools for Xilinx devices. RapidSmith alleviates several of the difficulties of using XDL and this work demonstrates the kinds of research facilitated by removing such challenges.
... Compared to previously reported FPGA-based biological sequence alignment accelerators [18][19][20][21][22], our FPGA-based web server solution has been designed to be platform-independent with a service-based model of operation in mind. Detailed comparison between our implementation and previously reported FPGA-based biological sequence alignment accelerators is presented in [24]. ...
Conference Paper
Full-text available
This paper presents the design and implementation of the FPGA-based Web server for biological sequence alignment. Central to this Web-server is a set of highly parameterisable, scalable, and platform-independent FPGA cores for biological sequence alignment. The Web server consists of an HTML-based interface, a MySQL database which holds user queries and results, a set of biological databases, a library of FPGA configurations, a host application servicing user requests, and an FPGA coprocessor for the acceleration of the sequence alignment operation. The paper presents a real implementation of this server on an HP ProLiant DL145 server with a Celoxica RCHTX FPGA board. Compared to an optimized pure software implementation, our FPGA-based Web server achieved a two order of magnitude speed-up for a pairwise protein sequence alignment application based on the Smith-Waterman algorithm. The FPGA-based implementation has the added advantage of being over 100x more energy efficient.
... Genome sequencing is a pattern matching dependent application that is very compute-intensive and, as a result, hardware implementations have been suggested as a remedy. Several hardware implementations have been proposed to accelerate the Smith-Waterman algorithm for sequence alignment212223. For the most part, these hardware implementations use systolic arrays to attain speedup in the Smith-Waterman algorithm. ...
Article
This paper demonstrates a keyword match processor capable of performing fast dictionary search with approximate match capability. Using a content addressable memory with processor element cells, the processor can process arbitrary sized keywords and match input text streams in a single clock cycle. We present an architecture that allows priority detection of multiple keyword matches on single input strings. The processor is capable of determining approximate match and providing distance information as well. A 64-word design has been developed using 19,000 transistors and it could be expanded to larger sizes easily. Using a modest 0.5 μm process, we are achieving cycle times of 10 ns and the design will scale to smaller feature sizes.
... This is done through sequence alignment ; also know as the minimum string edit-distance. Different algorithms are currently available for calculating the sequence alignment and much research work in implementing these algorithms efficiently has been performed234567811,15]. The most common of these are FASTA [9] and BLAST [10]. ...
Conference Paper
Full-text available
Biologists require ways to rapidly sequence vast amounts of DNA information. An approach to satisfying the demand is to provide hardware support and leverage parallel computation. When providing hardware acceleration it is known that a custom specific circuit would provide a high performance solution. Providing a balance between delivering an application-specific circuit while achieving optimal utilization of a field programmable gate array is a difficult task. This paper presents a technique in which a custom circuit solution for a given parameter set is generated for the edit-distance problem in comparing two sequences for similarity
... Its computational complexity, however, is very large (order L N to compare N sequences of length L), and it is not realistic to use algorithms based on dynamic programming even for alignment between two sequences on desk-top computers. In order to reduce the computation time, many heuristic algorithms [6,7,8] or hardware systems [9,10,11,12,13,14,15] have been proposed. Most of them, however, are designed for two-dimensional alignment (alignment between two sequences) because of the complexity to calculate alignment among more than two sequences under limited ...
Conference Paper
Full-text available
Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. Dynamic programming is a technique to find optimal alignment, but it requires very long computation time. We have shown that dynamic programming for more than two sequences can be efficiently processed on a compact system which consists of an off-the-shelf FPGA board and its host computer (node). The performance is, however, not enough for comparing long sequences. In this paper, we describe a computation method for the multidimensional dynamic programming on distributed systems. The method is now being tested using two nodes connected by Ethernet. According to our experiments, it is possible to achieve 5.1 times speedup with 16 nodes, and more speedup can be expected for comparing longer sequences using more number of nodes. The performance is affected only a little by the data transfer delay when comparing long sequences. Therefore, our method can be mapped on any kinds of networks with large delays.
... Hence, its computational complexity is very high (O(N d ), to make comparisons among d sequences with a length N ), so it is not realistic to use algorithms based on DP even for alignment between two sequences on desk-top computers. In order to reduce the computational time, many heuristic algorithms [6,7,8] and hardware systems [9,10,11,12,13,14,15,16] have been proposed. Most of them, nevertheless, are designed for two-dimensional alignment (alignment between two sequences) because huge amount of memory and very long computational time are required by alignment among three or more sequences. ...
Conference Paper
Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. Many systems for alignment have been proposed to date, but most of them are designed for two-dimensional alignment (alignment between two sequences), because huge amount of memory and very long computational time are required by alignment among three or more sequences. In this paper, we describe a compact system with an off-the-shelf FPGA board and a host computer for three-dimensional alignment using Dynamic Programming. Through our approach, high performance are attained by “two phase search” with reconfigurations of an FPGA and co-processing the FPGA and software. Furthermore, in order to achieve higher parallelism in the FPGA, we use a payoff matrix for matching elements in sequences and the matrix is divided into sub-matrices which are minimized. In comparison to a single Intel Pentium4 2.53GHz processor, our system with a single XC2V6000 enables more than 250-fold speedup.
Article
DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL , a novel G enome A ligNment DA ta- FL ow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.
Chapter
The recent and astonishing advances in Molecular Biology, which led to the sequencing of an unprecedented number of genomes, including the human, would not have been possible without the help of Bioinformatics. Bioinformatics can be defined as a research area where computational tools and algorithms are developed to help biologists in the task of understanding the organisms. Some Bioinformatics applications, such as pairwise and sequence-profile comparison, require a huge amount of computing power and, therefore, are excellent candidates to run in FPGA platforms. This chapter discusses in detail several recent proposals on FPGA-based accelerators for these two Bioinformatics applications, highlighting the similarities and differences among them. At the end of the chapter, research tendencies and open questions are presented.
Article
The recent and astonishing advances in Molecular Biology, which led to the sequencing of an unprecedented number of genomes, including the human, would not have been possible without the help of Bioinformatics. Bioinformatics can be defined as a research area where computational tools and algorithms are developed to help biologists in the task of understanding the organisms. Some Bioinformatics applications, such as pairwise and sequence-profile comparison, require a huge amount of computing power and, therefore, are excellent candidates to run in FPGA platforms. This chapter discusses in detail several recent proposals on FPGA-based accelerators for these two Bioinformatics applications, highlighting the similarities and differences among them. At the end of the chapter, research tendencies and open questions are presented.
Conference Paper
In this paper, we demonstrate the ability of spatial architectures to significantly improve both runtime performance and energy efficiency on edit distance, a broadly used dynamic programming algorithm. Spatial architectures are an emerging class of application accelerators that consist of a network of many small and efficient processing elements that can be exploited by a large domain of applications. In this paper, we utilize the dataflow characteristics and inherent pipeline parallelism within the edit distance algorithm to develop efficient and scalable implementations on a previously proposed spatial accelerator. We evaluate our edit distance implementations using a cycle-accurate performance and physical design model of a previously proposed triggered instruction-based spatial architecture in order to compare against real performance and power measurements on an x86 processor. We show that when chip area is normalized between the two platforms, it is possible to get more than a 50× runtime performance improvement and over 100× reduction in energy consumption compared to an optimized and vectorized x86 implementation. This dramatic improvement comes from leveraging the massive parallelism available in spatial architectures and from the dramatic reduction of expensive memory accesses through conversion to relatively inexpensive local communication.
Article
FPGAs have been successfully applied for cryptanalytic purposes, particularly in exhaustive key search that is a highly parallelizable task. In this work, we consider a pseudorandom generator scheme that consists of a number of subgenerators, the first of which is a linear feedback shift register (LFSR). LFSR is often used in cipher systems because of good cryptographic characteristics of its output sequence. The cryptanalysis has shown that if noisy prefix of the output sequence of this generator is known, it is possible to reconstruct the initial state of the LFSR by means of generalized correlated attack. The attack is based on the resolving of the constrained edit distance between the sequences determined by the initial states of the shift registers and the intercepted noisy output sequence. The systolic array architecture exploits the intrinsic parallelism of the dynamic programming algorithm for edit distance computation and achieve reductions in computation time of several orders of magnitude comparing with sequential calculation that is characteristic for software solutions. With a minimum increase of area, our design doubles the speed of similar approaches that are applied in bioinformatics, since there are no published ones for cryptanalysis. The obtained results on Xilinx Virtex and Virtex2 FPGA families also holds when a bus is connected, since our design takes into account the bus I/O bottleneck (i.e. PCI).
Article
Bio Informatics has emerged as one of those sciences in which if knowledge, if exploited ethically, will result in the general benefit of mankind. The enormity of DNA strand data has been revealed to be of humongous proportions. It is imperative to employ the art of parallel and distributed supercomputing in order to process such magnanimous magnitudes of data. We have deployed a scalable array of linearly connected hardware accelerators for the solution of the Smith-Waterman Algorithm; a technique used to resolve sequence alignment of DNA strands. We have synthesized the system on a reconfigurable platform and carried out a performance analysis of the speedup factor accomplished. The system is further connected to a powerful embedded microprocessor which, in a multithreaded environment, serves as an interface to the World Wide Web. This effort is in a bid to bring high-performance computing, in this domain, to the doorstep of scientists and enthusiasts alike in a cost-effective manner, thereby, triggering an avalanche of discoveries and providing much needed impetus to scientific work in this area.
Chapter
Full-text available
Exact and approximate string matching problem is a common and often repeated task in information retrieval and bioinformatics. As current free textual databases are growing almost exponentially with the time, the string matching problem is becoming more expensive in terms computational times. We believe that recent advances in parallel and distributed processing techniques are currently mature enough and can provide powerful computing means convenient for overcoming this string matching problem. In this chapter we present a short survey for well known sequential exact and approximate string searching algorithms. Further, we propose four text searching implementations onto general purpose parallel computer like a cluster of heterogeneous workstations using MPI message passing library. The first three parallel implementations are based on the static and dynamic master-worker methods. Further, we propose a hybrid parallel implementation that combines the advantages of static and dynamic parallel methods in order to reduce the load imbalance and communication overhead. Moreover, we present linear processor array architectures for flexible exact and approximate string matching. These architectures are based on parallel realization of dynamic programming and non-deterministic finite automaton algorithms. The algorithms consist of two phases, i.e. preprocessing and searching. Then, starting from the data dependence graphs of the searching phase, parallel algorithms are derived, which can be realized directly onto special purpose processor array architectures for approximate string matching. Further, the preprocessing phase is also accommodated onto the same processor array designs. In ad dition, the proposed architectures support flexible patterns i.e. patterns with a ''don't care'' symbol, patterns with a complement symbol and patterns with a class symbol. Finally, this chapter proposes a generic design of a programmable array processor architecture for a wide variety of approximate string matching algorithms to gain high performance at low cost. Further, we describe the architecture of the array and the architecture of the cell in more detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms.
Article
Full-text available
The BLAST algorithm is the prevalent tool that is used by molecular biologists for DNA sequence matching and database search. In this work we demonstrate that with an appropriate reconfigurable architecture, BLAST performance can be improved with a single-chip solution 5 times over a specialized and optimized computer cluster, or 37 times over a single computer. These initial results account for I/O and are very encouraging for the development of a large scale, reconfigurable BLAST engine.
Article
Full-text available
DNA sequence alignment is a very important problem in bioinformatics. The algorithm proposed by Smith-Waterman (SW) is an exact method that obtains optimal local alignments in quadratic space and time. For long sequences, quadratic complexity makes the use of this algorithm impractical. In this scenario, the use of a reconfigurable architecture is a very attractive alternative. This article presents the design and evaluation of an FPGA-based architecture that obtains the similarity score between DNA sequences, as well as its coordinates. The results obtained in a Xilinx xc2vp70 FPGA prototype presented a speedup of 246.9 over the software solution to compare sequences of size 100MBP and 100BP, respectively. Different from others hardware solutions that just calculate alignment scores, our design was able to avoid architecture's bottlenecks and accelerate the most computer intensive part of a sequence alignment software algorithm.
Conference Paper
Reconfigurable technology offers great advantages in bioinformatics applications vs. general-purpose computing. The presentation outlined in this paper looks into the attributes of several bioinformatics algorithms which make them suitable for reconfigurable computing, the resulting architectures, and their performance tradeoffs vs. general-purpose computers, graphics processor units (GPU) and VLSI.
Article
In this paper, we present linear processor array architectures for flexible approximate string matching. These architectures are based on parallel realization of dynamic programming and non-deterministic finite automaton algorithms. The algorithms consist of two phases, i.e. preprocessing and searching. Then, starting from the data dependence graphs of the searching phase, parallel algorithms are derived, which can be realized directly onto special purpose processor array architectures for approximate string matching. Further, the preprocessing phase is also accommodated onto the same processor array designs. Finally, the proposed architectures support flexible patterns i.e. patterns with a “don’t care” symbol, patterns with a complement symbol and patterns with a class symbol.
Conference Paper
This paper proposes a generic programmable array processor architecture for a wide variety of approximate string matching algorithms. Further, we describe the architecture of the array and the architecture of the cell in detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms. Further, the architecture performs approximate string matching for complex patterns that contain don't care, complement and classes symbols. Our architecture maximizes the strength of VLSI in terms of intensive and pipelined computing and yet circumvents the limitation on communication. It may be adopted as a basic structure for a universal flexible string matcher engine.
Conference Paper
Full-text available
DNA sequence comparison is a computationally intensive problem, known widely since the competition for human DNA decryption. Database search for DNA sequence comparison is of great value to computational biologists. Several algorithms have been developed and implemented to solve this problem efficiently, but from a user base point of view the BLAST algorithm is the most widely used one. In this paper we present a new architecture for the BLAST algorithm. The new architecture was fully designed, placed and routed. The post place-and-route cycle-accurate simulation, accounting for the I/O, shows a better performance than a cluster of workstations running highly optimized code over identical datasets. The new architecture and detailed performance results are presented in this paper.
Article
Full-text available
This paper presents a linear systolic array for quantifying the similarity between two strings over a given alphabet. The architecture is a parallel realization of a standard dynamic programming algorithm. Also introduced is a novel encoding scheme which minimizes the number of bits required to represent a state in the computation, significantly reducing the size of a processor. An nMOS prototype, to be used in searching genetic databases for DNA strands which closely match a target sequence, is being implemented. Preliminary results indicate that it will perform hundreds to thousands of times faster than a minicomputer.
Conference Paper
Full-text available
Many applications require the use of multiple, loosely-coupled adaptive computing boards as part of a larger computing system. Two such application classes are embedded systems in which multiple boards are required to physically interface to different sensors/actuators and applications whose computational demands require multiple boards. In addition to the adaptive computing boards, the computing systems for these application classes typically include general-purpose microprocessors and high-speed networks. The development environment for applications on these large computing systems is not unified. Typically, a developer uses VHDL simulation and synthesis tools to program the FPGAs on the adaptive computing boards. External control for the board, such as downloading new configurations or setting clock speeds, is provided through a vendor-specific API. This API is typically accessed in a C host program that the developer must write in a high-level language environment. Finally, the developer is responsible for writing the networking code that allows interaction between the separate adaptive computing boards and general-purpose microprocessors. No tools are available for either debugging or performance monitoring in this agglomerated system. Development on these systems is time-consuming and platform-specific. A standard ACS API is proposed to provide a developer with a single API for the control of a distributed system of adaptive computing boards, including the interconnection network
Article
JBits(tm), the Xilinx Bitstream Interface is a set of Java (tm) classes which provide an Application Program Interface (API) into the Xilinx FPGA bitstream. This interface operates on either bitstreams generated by Xilinx design tools, or on bitstreams read back from actual hardware. This provides the capability of designing, modifying and dynamically modifying circuits in Xilinx XC4000 (tm) series FPGA devices. The programming model used by JBits is a two dimensional array of Configurable Logic Blocks (CLBs). Each CLB is referenced by a row and column, and all configurable resources in the selected CLB may be set or probed. Additionally, control of all routing resources adjacent to the selected CLB are made available. Because the code is written in Java, compilation times are very fast, and because control is at the CLB level, bitstreams can typically be modified or generated in times on the order of one second or less. This API has been used to construct complete circuits and to modify existing circuits. In addition, the object oriented support in the Java programming language has permitted a small library of parameterizable, object oriented macro circuits or Cores to be implemented. Finally, this API may be used as a base to construct other tools. This includes traditional design tools for performing tasks such as circuit placement and routing, as well as application specific tools to perform more narrowly defined tasks.
Conference Paper
As the emerging field of bioinformatics continues to expand, the abil- ity to rapidly search large databases of genetic information is becoming increas- ingly important. Databases containing billions of data elements are routinely compared and searched for matching and near-matching patterns. In this paper we explore the use of run-time reconfiguration using field programmable gate arrays (FPGAs) to provide a compact, high-performance matching solution to accelerate the searching of these genetic databases. This implementation provides approxi- mately an order of magnitude increase in performance while reducing hardware complexity by as much as three orders of magnitude when compared to existing commercial systems.
Article
Introduction to Computational Biology: Maps, Sequencesand Genomes. Chapman Hall, 1995.[WF74] R.A. Wagner and M.J. Fischer. The String to String Correction Problem. Journal of the ACM, 21(1):168--173, 1974.[WM92] S. Wu and U. Manber. Fast Text Searching Allowing Errors. Communicationsof the ACM, 10(35):83--91, 1992.73Bibliography[KOS+00] S. Kurtz, E. Ohlebusch, J. Stoye, C. Schleiermacher, and R. Giegerich.Computation and Visualization of Degenerate Repeats in CompleteGenomes. In ...
Conference Paper
The author describes two systolic arrays for computing the edit distance between two genetic sequences using a well-known dynamic programming algorithm. The systolic arrays have been implemented for the Splash 2 programmable logic array and are intended to be used for database searching. Simulations indicate that the faster Splash 2 implementation can search a database at a rate of 12 million characters per second, several orders of magnitude faster than implementations of the dynamic programming algorithm on conventional computers
Article
A description is given of the Princeton Nucleic Acid Comparator (P-NAC), a linear systolic array for comparing DNA sequences. The architecture is a parallel realization of a standard dynamic programming algorithm. Benchmark timings of a VLSI implementation confirm that for its dedicated application, P-NAC is two orders of magnitude faster than current microcomputers. Experience with the prototype is shaping the design of a second-generation device, to be known as the Brown Nucleic Acid Comparator (B-NAC), that will be algorithmically flexible and more tolerant of fabrication faults.
Article
Developing applications for distributed adaptive computing systems (ACS) requires developers to have knowledge of both parallel computing and configurable computing. Furthermore, portability and scalability are required for developers to use innovative ACS research directly in deployed systems. This thesis presents an Application Programming Interface (API) implementation developed in a scalable parallel ACS system. The API gives the developer the ability to easily control both single board and multi-board systems in a network cluster environment. The API implementation is highly portable and scalable, allowing ACS researchers to easily move from a research system to a deployed system. The thesis details the design and implementation of the API, as well as analyzes its performance. iii ACKNOWLEDGMENTS I am indebted to Dr. Mark Jones, my major professor and thesis chair, for his inspiration and guidance at various stages throughout this project. Dr. Jones has generously given much of his time to talk with me and to read and consider the drafts I have prepared. His expertise and advice are invaluable in this project. I would also like to acknowledge and thank the members of my committee Dr. Athanas and Dr. Midkiff for reading my thesis and providing insightful opinions. For help with the proofreading, I am indebted to Katherine for her helpful suggestions and enormous patience. A special thanks is due to my parents, my uncle, and my wife, who have always held great faith in me and extended generous support to my study and my life. iv Table Of Contents 1.
Article
This paper describes an implementation of a novel systolic array for sequence alignment on the SPLASH reconfigurable logic array. The systolic array operates in two phases. In the first phase, a sequence comparison array due to Lopresti [2] is used to compute a matrix of distances which is stored in local RAM. In the second phase, the stored distances are used by the alignment array to produce a binary encoding of the sequence alignment. Preliminary benchmarks show that the SPLASH implementation performs several orders of magnitude faster than implementation on supercomputers. 1 Introduction The work presented in this paper was begun during one co-author's summer internship at the National Cancer Institute's Laboratory of Mathematical Biology in Fredrick, Maryland. The goal was to develop genetic sequence analysis algorithms for the SPLASH reconfigurable logic array [3]. A systolic sequence comparison algorithm that computes the edit distance between a pair of sequences had already b...