Deshanand P. Singh’s research while affiliated with Altera Corporation and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (35)


High-Level Design Tools for Floating Point FPGAs
  • Conference Paper

February 2015

·

21 Reads

·

2 Citations

Deshanand P. Singh

·

·

This tutorial describes tools for efficiently implementing floating point applications on FPGAs. We present both the SDK for OpenCL and DSP Builder Advanced Blockset and show that they can be effectively used to implement many floating point applications. The methods for optimizing application performance are also described. In this tutorial we focus on a few applications, including Fast Fourier transform, matrix multiplication, finite impulse response filter and a Cholesky decomposition. In all cases we show what the tools are capable of achieving, and more importantly how a user can take advantage of the various floating-point centric features that are made available. We also discuss how these tools can automatically use FPGA architectural features such as hardened floating-point DSP available on Altera Arria 10 family.


Configuring a programmable device using high-level language
  • Patent
  • Full-text available

February 2015

·

9 Reads

A method of preparing a programmable integrated circuit device for configuration using a high-level language includes compiling a plurality of virtual programmable devices from descriptions in said high-level language. The compiling includes compiling configurations of configurable routing resources from programmable resources of said programmable integrated circuit device, and compiling configurations of a plurality of complex function blocks from programmable resources of said programmable integrated circuit device. A machine-readable data storage medium may be encoded with a library of such compiled configurations. A virtual programmable device may include a stall signal network and routing switches of the virtual programmable device may include stall signal inputs and outputs.

Download

M/A for performing automatic latency optimization on system designs for implementation on programmable hardware

December 2014

·

12 Reads

A method for performing latency optimization on a system design to be implemented on a target device includes inserting a variable latency indicator in the system design at a place where latency can be varied. The system design includes pipeline registers at the place where the variable latency indicator is inserted. Latency optimization is then automatically performed on the system design, during a computer aided design flow performed by an electronic Design Automation (EDA) tool, by varying the number of the pipeline registers at the variable latency indicator to obtain optimized latency without affecting system performance of the system design.


Method and apparatus for performing multiple stage physical synthesis

October 2014

·

6 Reads

Deshanand Singh

·

Valavan Manohararajah

·

Gordon Raymond Chiu

·

[...]

·

A method for designing a system on a target device includes entering the system. The system is synthesized. The system is mapped. The system is placed on the target device. The system is routed. Physical synthesis is performed on the system immediately after more than one of the entering, synthesizing, mapping, placing and routing procedures.


Efficient configuration of an integrated circuit device using high-level language

August 2014

·

13 Reads

A method of programming or configuring an integrated circuit device using a high-level language includes parsing a logic flow to be embodied in the integrated circuit device to identify branching control flow, converting the branching control flow into predicated instructions, incorporating the predicated instructions into a high-level language representation of a configuration of resources of the integrated circuit device, and compiling the high-level language representation to configure said integrated circuit device. The high-level language representation can be executed to generate a configuration bitstream for the programmable integrated circuit device, or can be run on a processor on the programmable integrated circuit device to instantiate the configuration.


Method and apparatus for performing fast incremental resynthesis

May 2014

·

11 Reads

A method for designing a system on a target device is disclosed. A first netlist is generated or a first version of the system in a first compilation. Optimizations are performed on the first version of the system during synthesis resulting in a second netlist. A third netlist is generated or a second version of the system in a second compilation. The first version of the system in the first netlist and the second version of the system in the third netlist are differentiated to identify identical regions.


Integrated circuit compilation

February 2014

·

14 Reads

Systems and methods for increasing speed and reducing processing power of a compile process of programmable logic of an integrated circuit (IC) are provided. For example, in one embodiment, a method includes obtaining a high level program, comprising computer-readable instructions for implementation on programmable logic of an integrated circuit (IC); translating the high level program into low level code representative of functional components needed to execute functionalities of the high level program; generating a host program comprising computer-readable instructions for implementing the low level code based upon the high level program; obtaining modifications to the high level program; determining whether the modifications can be implemented by a new host program utilizing the low level code; and generating the new host program to implement the modifications, when the modifications can be implemented by the new host program utilizing the low level code.


Method and apparatus for implementing soft constraints in tools used for designing programmable logic devices

November 2013

·

16 Reads

A method for designing a system on a target device utilizing programmable logic devices (PLDs) includes generating options for utilizing resources on the PLDs in response to user specified constraints. The options for utilizing the resources on the PLDs are refined independent of the user specified constraints.


Methods and systems for measuring and presenting performance data of a memory controller system

July 2013

·

19 Reads

Mechanisms for measuring, analyzing, and presenting performance data associated with a memory controller system are described. The mechanisms include a performance monitor that detects and analyzes performance including efficiency and latency of a memory controller system. In addition to determining performance, the systems identifies reasons for loss of memory controller system efficiency. Moreover, the reasons, the efficiency, and the latency are analyzed and presented in a manner easily understandable to a user.


Harnessing the power of FPGAs using altera's OpenCL compiler

February 2013

·

140 Reads

·

14 Citations

In recent years, Field-Programmable Gate Arrays have become extremely powerful computational platforms that can efficiently solve many complex problems. The most modern FPGAs comprise effectively millions of programmable elements, signal processing elements and high-speed interfaces, all of which are necessary to deliver a complete solution. The power of FPGAs is unlocked via low-level programming languages such as VHDL and Verilog, which allow designers to explicitly specify the behavior of each programmable element. While these languages provide a means to create highly efficient logic circuits, they are akin to "assembly language" programming for modern processors. This is a serious limiting factor for both productivity and the adoption of FPGAs on a wider scale. In this talk, we use the OpenCL language to explore techniques that allow us to program FPGAs at a level of abstraction closer to traditional software-centric approaches. OpenCL is an industry standard parallel language based on 'C' that offers numerous advantages that enable designers to take full advantage of the capabilities offered by FPGAs, while providing a high-level design entry language that is familiar to a wide range of programmers. To demonstrate the advantages a high-level programming language can offer, we demonstrate how to use Altera's OpenCL Compiler on a set of case studies. The first application is single-precision general-element matrix multiplication (SGEMM). It is an example of a highly-parallel algorithm for which an efficient circuit structures are well known. We show how this application can be implemented in OpenCL and how the high-level description can be optimized to generate the most efficient circuit in hardware. The second application is a Fast Fourier Transform (FFT), which is a classical FPGA benchmark that is known to have a good implementation on FPGAs. We show how we can implement the FFT algorithm, while exploring the many different possible architectural choices that lead to an optimized implementation for a given FPGA. Finally, we discuss a Monte-Carlo Black-Scholes simulation, which demonstrates the computational power of FPGAs. We describe how a random number generator in conjunction with computationally intensive operations can be harnessed on an FPGA to generate a high-speed benchmark, which also consumes far less power than the same benchmark running on a comparable GPU. We conclude the tutorial with a set of live demonstrations. Through this tutorial we show the benefits high-level languages offer for system-level design and productivity. In particular, Altera's OpenCL compiler is shown to enable high-performance application design that fully utilizes capabilities of modern FPGAs.


Citations (22)


... Examples of small local improvements are the insertion of some additional buffers or the increase of the driving capacity of certain gates to speed up the slowest connections. Somewhat larger changes include retiming (e.g., [IK01]) or limited logic restructuring (e.g., [GKSV01,SB02]). ...

Reference:

Predictie van interconnectie-eigenschappen van digitale schakelingen voor de exploratie van ontwerpkeuzes en technologie
Incremental placement for layout driven optimizations on FPGAs
  • Citing Conference Paper
  • January 2002

IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers

... It is important to emphasize that the dynamic range needs to be determined only once and is not part of the design space exploration. Thus, its execution time is not as critical as accuracy evaluation, and methods based on profiling (e.g., Chen and Singh [9]) can be used. Nonetheless, estimation of dynamic range based on convolutions is an interesting direction of future work. ...

Profile-Guided Floating- to Fixed-Point Conversion for Hybrid FPGA-Processor Applications
  • Citing Article
  • January 2013

ACM Transactions on Architecture and Code Optimization

... The development of FPGA-based heterogeneous systems requires a diverse skill set, covering everything from high-level algorithm design to intricate circuit implementation [24]. Although recent tools like High-Level Synthesis (HLS) [31,32] and SNAP [10,33] automate parts of circuit creation, the challenge remains in building end-to-end database acceleration systems that adapt to custom query conditions. Furthermore, systematic evaluation of a heterogeneous system goes beyond development. ...

Harnessing the power of FPGAs using altera's OpenCL compiler
  • Citing Conference Paper
  • February 2013

... The strength of FPGAs is that they can be reconfigured and adapted for the type of algorithms to execute, mapping an algorithm one-to-one to the FPGA hardware. This involves "programming" the FPGA using a hardware description language (HDL) such as VHDL or Verilog [6]. The HDLs are used to generate a circuit description which is loaded onto the FPGA. ...

From OpenCL to high-performance hardware on FPGAs

... The Altera OpenCL SDK [2] allows programmers to use high-level OpenCL kernels, written for GPUs, to generate an FPGA design with higher performance per Watt [9]. In this work, an OpenCL kernel is first compiled and then synthesized as a special dedicated hardware for mapping on an FPGA. ...

Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAs for information filtering
  • Citing Conference Paper
  • August 2012

... In earlier versions of the place and route CAD flow, physical synthesis was performed exclusively after placement. Transformations include register retiming [12], timing-driven functional decomposition [13], local rewiring [14], and logic replication. shows register retiming, a powerful logic optimization technique for synchronous circuits. ...

Post-placement functional decomposition for FPGAs
  • Citing Article

... The problem of achieving timing closure is a classical problem in digital design. However, most prior work in the FPGA domain has concentrated on logic synthesis optimizations in order to improve timing closure when implementing RTL [23]. Similarly, most prior work in behavioral level optimization for timing closure concentrates on transformations such as memory partitioning [24], pipelining [25], retiming [12], and multi-cycle chaining [26]. ...

An area-efficient timing closure technique for FPGAs using Shannon's expansion
  • Citing Article
  • February 2007

Integration