Doris Chen’s research while affiliated with Altera Corporation and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Method and apparatus for performing fast incremental resynthesis
  • Patent
  • Full-text available

May 2014

·

11 Reads

Doris Tzu Lang Chen

·

Deshanand Singh

A method for designing a system on a target device is disclosed. A first netlist is generated or a first version of the system in a first compilation. Optimizations are performed on the first version of the system during synthesis resulting in a second netlist. A third netlist is generated or a second version of the system in a second compilation. The first version of the system in the first netlist and the second version of the system in the third netlist are differentiated to identify identical regions.

Download

Integrated circuit compilation

February 2014

·

14 Reads

Systems and methods for increasing speed and reducing processing power of a compile process of programmable logic of an integrated circuit (IC) are provided. For example, in one embodiment, a method includes obtaining a high level program, comprising computer-readable instructions for implementation on programmable logic of an integrated circuit (IC); translating the high level program into low level code representative of functional components needed to execute functionalities of the high level program; generating a host program comprising computer-readable instructions for implementing the low level code based upon the high level program; obtaining modifications to the high level program; determining whether the modifications can be implemented by a new host program utilizing the low level code; and generating the new host program to implement the modifications, when the modifications can be implemented by the new host program utilizing the low level code.


Profile-Guided Floating- to Fixed-Point Conversion for Hybrid FPGA-Processor Applications

January 2013

·

24 Reads

·

2 Citations

ACM Transactions on Architecture and Code Optimization

The key to enabling widespread use of FPGAs for algorithm acceleration is to allow programmers to create efficient designs without the time-consuming hardware design process. Programmers are used to developing scientific and mathematical algorithms in high-level languages (C/C++) using floating point data types. Although easy to implement, the dynamic range provided by floating point is not necessary in many applications; more efficient implementations can be realized using fixed point arithmetic. While this topic has been studied previously [Han et al. 2006; Olson et al. 1999; Gaffar et al. 2004; Aamodt and Chow 1999], the degree of full automation has always been lacking. We present a novel design flow for cases where FPGAs are used to offload computations from a microprocessor. Our LLVM-based algorithm inserts value profiling code into an unmodified C/C++ application to guide its automatic conversion to fixed point. This allows for fast and accurate design space exploration on a host microprocessor before any accelerators are mapped to the FPGA. Through experimental results, we demonstrate that fixed-point conversion can yield resource savings of up to 2x--3x reductions. Embedded RAM usage is minimized, and 13%--22% higher Fmax than the original floating-point implementation is observed. In a case study, we show that 17% reduction in logic and 24% reduction in register usage can be realized by using our algorithm in conjunction with a High-Level Synthesis (HLS) tool.


Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAs for information filtering

August 2012

·

75 Reads

·

50 Citations

The FPGA can be a tremendously efficient computational fabric for many applications. In particular, the performance to power ratios of FPGA make them attractive solutions to solve the problem of data centers that are constrained largely by power and cooling costs. However, the complexity of the FPGA design flow requires the programmer to understand cycle-accurate details of how data is moved and transformed through the fabric. In this paper, we explore techniques that allow programmers to efficiently use FPGAs at a level of abstraction that is closer to traditional software-centric approaches by using the emerging parallel language, OpenCL. Although the field of high level synthesis has evolved greatly in the last few decades, several fundamental parts were missing from the complete software abstraction of the FPGA. These include standard and portable methods of describing HW/SW codesign, memory hierarchy, data movement and control of parallelism. We believe that OpenCL addresses all of these issues and allows for highly efficient description of FPGA designs with a higher level of abstraction. We demonstrate this premise by examining the performance of a document filtering algorithm, implemented in OpenCL and automatically compiled to a Stratix IV 530 FPGA. We show that our implementation achieves 5.5× and 5.25× better performance per watt ratios than GPU and CPU implementations, respectively.


Line-Level Incremental reSynthesis techniques for FPGAs

February 2011

·

18 Reads

·

4 Citations

FPGA logic density is roughly doubling at every process generation. Consequently, it is becoming increasingly challenging for FPGA CAD tools to keep up with the growing complexities of high-speed designs while keeping CAD run-times reasonable. In this paper, we present a novel incremental resynthesis tool called Line-Level Incremental reSynthesis (LLIS), integrated within an industrial tool suite, that addresses the problems of timing closure as well as CAD runtime (patent pending). We describe a general framework that can incrementally reuse results from a previous compile based on automatic differencing of HDL changes. We show that it is possible to reduce synthesis runtime by 6.5x for common HDL changes. As compared with complete resynthesis, we preserve known good timing solutions more than 82% of the time. This represents a 3X improvement vs. non-incremental techniques.


Parallelizing FPGA Technology Mapping Using Graphics Processing Units (GPUs)

August 2010

·

23 Reads

·

14 Citations

GPUs are becoming an increasingly attractive option for obtaining performance speedups for data-parallel applications. FPGA technology mapping is an algorithm that is heavily data parallel; however, it has many features that make it unattractive to implement on a GPU. The algorithm uses data in irregular ways since it is a graph-based algorithm. In addition, it makes heavy use of constructs like recursion which is not supported by GPU hardware. In this paper, we take a state-of-the-art FPGA technology mapping algorithm within Berkeley's ABC package and attempt to parallelize it on a GPU. We show that runtime gains of 3.1× are achievable while maintaining identical quality as demonstrated by running these netlists through Altera's Quartus II place-and-route tool.


A comprehensive approach to modeling, characterizing and optimizing for metastability in FPGAs

February 2010

·

99 Reads

·

30 Citations

Doris Chen

·

Deshanand Singh

·

Jeffrey Chromczak

·

[...]

·

Metastability is a phenomenon that can cause system failures in digital circuits. It may occur whenever signals are being transmitted across asynchronous or unrelated clock domains. The impact of metastability is increasing as process geometries shrink and supply voltages drop faster than transistor Vts. FPGA technologies are significantly affected since leading edge FPGAs are amongst the first devices to adopt the most recent process nodes. In this paper, we present a comprehensive suite of techniques for modeling, characterizing and optimizing metastability effects in FPGAs. We first discuss a theoretical model of metastability, and verify the predictions using both circuit level simulations and board measurements. Next we show how designers have traditionally dealt with metastability problems and contrast that with the automatic CAD algorithms described in this paper that both analyze and optimize metastability-related issues. Through our detailed experimental results, we show that we can improve the metastability characteristics of a large suite of industrial benchmarks by an average of 268,000 times with our optimization techniques.

Citations (5)


... It is important to emphasize that the dynamic range needs to be determined only once and is not part of the design space exploration. Thus, its execution time is not as critical as accuracy evaluation, and methods based on profiling (e.g., Chen and Singh [9]) can be used. Nonetheless, estimation of dynamic range based on convolutions is an interesting direction of future work. ...

Reference:

Toward Scalable Source Level Accuracy Analysis for Floating-point to Fixed-point Conversion
Profile-Guided Floating- to Fixed-Point Conversion for Hybrid FPGA-Processor Applications
  • Citing Article
  • January 2013

ACM Transactions on Architecture and Code Optimization

... The Altera OpenCL SDK [2] allows programmers to use high-level OpenCL kernels, written for GPUs, to generate an FPGA design with higher performance per Watt [9]. In this work, an OpenCL kernel is first compiled and then synthesized as a special dedicated hardware for mapping on an FPGA. ...

Invited paper: Using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAs for information filtering
  • Citing Conference Paper
  • August 2012

... La disminución de las dimensiones hacia el orden de los nanómetros propicia la aparición de efectos que no eran apreciables a nivel de micrómetros y que influyen en el riesgo de metaestabilidad. Por ejemplo, las nuevas tensiones de alimentación son más cercanas a la tensión de umbral de los transistores [21][22][23][24] y en consecuencia se ha observado un cambio en el comportamiento de estos dispositivos ante variaciones de temperatura [25,26]. El análisis del impacto que tienen la tecnología de fabricación, el potencial de la fuente de alimentación y la temperatura de trabajo (PVT: process, voltage and temperature) en los tiempos de respuesta de los nuevos dispositivos y en el riesgo de metaestabilidad, es de especial interés para la comunidad científica [22,23,27,28,29,30]. ...

A comprehensive approach to modeling, characterizing and optimizing for metastability in FPGAs
  • Citing Conference Paper
  • February 2010

... Parallelized Technology Mapping. There have been extensive research efforts on parallelizing FPGA technology mapping using multicore processors and graphics processing units [8,18,20,30], where a key technique is to partition a large netlist into multiple sub-netlists and assign them to different threads for parallel mapping. Up to 3× speedup has been shown by parallelizing a sequential version of the technology mapping with negligible overheads in solution quality. ...

Parallelizing FPGA Technology Mapping Using Graphics Processing Units (GPUs)
  • Citing Conference Paper
  • August 2010