Valavan Manohararajah’s research while affiliated with Altera Corporation and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (27)


The Stratix™ 10 Highly Pipelined FPGA Architecture
  • Conference Paper

February 2016

·

288 Reads

·

61 Citations

David Lewis

·

Gordon Chiu

·

Jeffrey Chromczak

·

[...]

·

John Van Dyken

This paper describes architectural enhancements in the Altera Stratix? 10 HyperFlex? FPGA architecture, fabricated in the Intel 14nm FinFET process. Stratix 10 includes ubiquitous flip-flops in the routing to enable a high degree of pipelining. In contrast to the earlier architectural exploration of pipelining in pass-transistor based architectures, the direct drive routing fabric in Stratix-style FPGAs enables an extremely low-cost pipeline register. The presence of ubiquitous flip-flops simplifies circuit retiming and improves performance. The availability of predictable retiming affects all stages of the cluster, place and route flow. Ubiquitous flip-flops require a low-cost clock network with sufficient flexibility to enable pipelining of dozens of clock domains. Different cost/performance tradeoffs in a pipelined fabric and use of a 14nm process, lead to other modifications to the routing fabric and the logic element. User modification of the design enables even higher performance, averaging 2.3X faster in a small set of designs.


Heterogeneous programmable device and configuration software adapted therefor
  • Patent
  • Full-text available

May 2015

·

5 Reads

A method of configuring a programmable integrated circuit device with a user logic design includes analyzing the user logic design to identify unidirectional logic paths within the user logic design and cyclic logic paths within the user logic design, assigning the cyclic logic paths to logic in a first portion of the programmable integrated circuit device that operates at a first data rate, assigning the unidirectional logic paths to logic in a second portion of the programmable integrated circuit device that operates at a second data rate lower than the first data rate, and pipelining the unidirectional data paths in the second portion of the programmable integrated circuit device to compensate for the lower second data rate. A programmable integrated circuit device adapted to carry out such method may have logic regions operating at different rates, including logic regions with programmably selectable data rates.

Download

Method and apparatus for performing optimization using don't care states

December 2014

·

4 Reads

A method for designing a system on a target device includes determining a realization set of a signal that includes one or more representations of the signal where at least one of the representation is influenced by a Don't Care Set (DCS) and all representations are equivalent. The realization set is propagated through the system with the signal. The realization set is used to perform a plurality of separate optimizations on the logic.


Method and apparatus for performing multiple stage physical synthesis

October 2014

·

6 Reads

A method for designing a system on a target device includes entering the system. The system is synthesized. The system is mapped. The system is placed on the target device. The system is routed. Physical synthesis is performed on the system immediately after more than one of the entering, synthesizing, mapping, placing and routing procedures.


Method and system for operating a circuit

September 2014

·

6 Reads

Operation of a programmable circuit is described. A circuit including a plurality of multiplexers may be used to perform at least one operation on a plurality of signals. The at least one operation may be performed by the multiplexers using a select line coupled to or shared by the multiplexers. Each input of the circuit may couple to a respective output of a plurality of logic elements. As such, the circuit may be used to perform at least one operation on signals supplied from a plurality of logic elements, thereby expanding the functionality of at least one logic element coupled to the circuit and/or increasing the number of logic elements and other resources available for implementing user designs or performing other functions.


Specification of latency in programmable device configuration

September 2014

·

10 Reads

A method of configuring a programmable integrated circuit device with a user logic design includes accepting a first user input defining the user logic design, accepting a second user input defining latency characteristics of the user logic design, determining a configuration of the programmable integrated circuit device having the user logic design, and retiming the configuration based on the second user input.


Programmable device configuration methods adapted to account for retiming

March 2014

·

7 Reads

A method of configuring an integrated circuit device with a user logic design includes analyzing the user logic design to identify critical and near-critical cyclic logic paths within the user logic design, applying timing optimizations to the critical and near-critical cyclic logic paths, and retiming logic paths other than the critical and near-critical cyclic logic paths.


Specification of multithreading in programmable device configuration

February 2014

·

5 Reads

A method of configuring a programmable integrated circuit device with a user logic design includes accepting a first user input defining the user logic design, accepting a second user input defining multithreading characteristics of at least a portion the user logic design, determining a configuration of the programmable integrated circuit device having the user logic design, multithreading the at least a portion of the configuration based on the second user input, and retiming the multithreaded configuration.


Methods for memory interface calibration

November 2013

·

5 Reads

Integrated circuits with memory interface circuitry may be provided. Prior to calibration, a number of samples may be determined by computing probability density function curves as a function of timing window edge asymmetry for different degrees of oversampling. During calibration, duty cycle distortion in data strobe signals may be corrected by selectively delaying the data strobe rising or falling edges. A data clock signal that is used for generating data signals may also suffer from duty cycle distortion. The rising and falling edges of the data clock signal may be selectively delayed to correct for duty cycle distortion. The data path through which the data signals are routed may be adjusted to equalize rising and falling transitions to minimize data path duty cycle distortion. Multi-rank calibration may be performed by calibrating to an intersection of successful settings that allow each memory rank to pass memory operation tests.


Integrated circuits with multi-stage logic regions

November 2013

·

2 Reads

A programmable logic region on a programmable integrated circuit may include a first set of look-up tables that receive programmable logic region input signals and a second set of look-up tables that produce programmable logic region output signals. Multiplexer circuitry may be interposed between the first and second sets of look-up tables. The multiplexer circuitry may receive the programmable logic region input signals in parallel with the output signals from the first set of look-up tables and may provide corresponding selected signals to the second set of look-up tables. The programmable logic region input signals may be shared by the first and second sets of look-up tables. Logic circuitry may be coupled to outputs of the first and second sets of look-up tables. The logic circuitry may be configured to logically combine output signals from the first and second sets of look-up tables.


Citations (12)


... First, the freeze signal would have very high fanout, leading to a long interconnect delay that can limit F max . Second, Stratix 10 has optional registers in each routing wire driver [23], allowing very deep pipelining that HPIPE exploits, but these registers are simple and do not have a clock enable. Consequently, freezing all the logic with a clock enable would prevent use of interconnect registers, making both re-timing of the circuitry less effective and placement of the pipeline registers more difficult. ...

Reference:

H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory
The Stratix™ 10 Highly Pipelined FPGA Architecture
  • Citing Conference Paper
  • February 2016

... In modern FPGAs [11, 12] configuration memory is a small fraction of total chip area, and a small portion of this configuration memory is duplicated in order to allow two distinct subcircuits to share the same set of programmable resources. A detailed study of this area tradeoff can be found in [10]. Here we present a high-level overview of the architecture and present a synthesis method that helps reduce circuit size significantly when targeting AFPGAs., the configuration element used by FPGAs, and (b), the configuration element used only by AFPGAs. ...

Area Optimizations in FPGA Architecture and CAD
  • Citing Article

... In earlier versions of the place and route CAD flow, physical synthesis was performed exclusively after placement. Transformations include register retiming [12], timing-driven functional decomposition [13], local rewiring [14], and logic replication. shows register retiming, a powerful logic optimization technique for synchronous circuits. ...

Post-placement functional decomposition for FPGAs
  • Citing Article

... It is thus proposed that heuristics be used to solve this problem. The method is based on the approach of Lou et al. to modelling the device as a grid of channel cells [Lou et al. 2002] and the delay-lookup approach of Manohararajah et al. [Manohararajah et al. 2006]. The algorithm for mapping arcs to the device is described in Algorithm 2. Initially sorting the connections in descending order mimics what a detailed router does at a higher level by allowing longer connections to use faster routing paths through the channel cells, thereby reducing the risk these nets become critical. ...

Difficulty of predicting interconnect delay in a timing driven FPGA CAD flow
  • Citing Conference Paper
  • March 2006

... The new temperature, T new , in given by Tnew = τ . T old , where the value of τ depends on the fraction of attempted moves that were accepted (R accept ) at T old and is determined using an approach similar to [6] and [7]. Finally, the outer loop exit criterion stops the annealing process if the temperature is less than a small fraction () of the average criticality cost per external connection. ...

Automatic Partitioning for Improved Placement and Routing in Complex Programmable Logic Devices
  • Citing Conference Paper
  • September 2002

Lecture Notes in Computer Science

... In the second term, δ wt also represents the existence of buffers in case δ wt =δ wt . Solving the optimization problem above identifies nodes with unequal padding delays δ wt and δ wt , indicating potential locations of sequential delay units, as a set S. These delays may still violate the exact constraints in (5)- (15), so that they need to be refined further. ...

Incremental retiming for FPGA physical synthesis
  • Citing Conference Paper
  • July 2005