
Alexander FellUniversity of Chicago | UC · Department of Computer Science
Alexander Fell
PhD
About
29
Publications
16,269
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
206
Citations
Introduction
The UpDown system architecture at University of Chicago is designed for irregular graph computations. It features efficient thread invocations, direct messaging without network cards, and split-transaction memory operations for high bandwidth. With global addressing and advanced networking, it leverages edge and vertex parallelism for superior graph processing performance.
Additional affiliations
February 2009 - December 2012
January 2013 - August 2013
Position
- Senior Researcher
Description
- Mapping of graphs is known to be an NP-complete problem. In Coarse-Grained Reconfigurable Architectures (CGRA), this problem occurs if a Data Flow Graph (DFG) needs to be mapped onto the available physical hardware. The mapping should not only achieve high utilization of resources but also avoid increasing the height of the graph (=execution time) of the application.
Publications
Publications (29)
Convolutional Neural Networks (CNNs) are popular models that have been successfully applied to diverse domains like vision, speech, and text. To reduce inference-time latency, it is common to employ hardware accelerators, which often require a model compression step. Contrary to most compression algorithms that are agnostic of the underlying hardwa...
Program obfuscation is widely used to protect commercial software against reverse-engineering. However, an adversary can still download, disassemble and analyze binaries of the obfuscated code executed on an embedded System-on-Chip (SoC), and by correlating execution times to input values, extract secret information from the program. In this paper,...
Timing side-channel attacks pose a major threat to embedded systems due to their ease of accessibility. We propose CIDPro, a framework that relies on dynamic program diversification to mitigate timing side-channel leakage. The proposed framework integrates the widely used LLVM compiler infrastructure and the increasingly popular RISC-V FPGA soft-pr...
Timing side-channel attacks pose a major threat to embedded systems due to their ease of accessibility. We propose CIDPro, a framework that relies on dynamic program diversification to mitigate timing side-channel leakage. The proposed framework integrates the widely used LLVM compiler infrastructure and the increasingly popular RISC-V FPGA soft-pr...
Slides of the oral presentation of the corresponding paper (https://www.researchgate.net/publication/327495855_CIDPro_Custom_Instructions_for_Dynamic_Program_Diversification)
Semi-Global Matching (SGM) is a popular algorithm to calculate depth maps in stereo images offering the best trade-off among accuracy, computational costs and high frame rates. This paper presents two architectural improvements in FPGA implementations of SGM to achieve high frame rates. First, a highly parallel, pipelined and scalable architecture...
With the advancement in technology nodes, the number of components operating in different clock domains in a System on Chip (SoC) increases. Asynchronous multi-port memory with dedicated write and read ports is used to allow data to cross clock domain boundaries. The dual-port memory architecture introduced in this paper, is based on the Single-Por...
In this paper, DFGenTool, a dataflow graph (DFG) generation tool, is presented, which converts loops in a sequential program given in a high-level language such as C, into a DFG. DFGenTool adapts DFGs for mapping to Coarse Grain Reconfigurable Architectures (CGRA) to enable a variety of CGRA implementations and compilers to be benchmarked against a...
A video analyzer is a comprehensive bitstream analysis tool which accelerates development and debugging of video bitstreams while ensuring compliance with industry standards. There are many conventional analyzers present for different video standards like H.264, HEVC which are compliant only with the respective video sequence format. In this work,...
Multi-port Static Random Access Memories (SRAM) are essential for shared data structures, especially in distributed, multi-core and multi-processing computing systems. This paper introduces an elementary multi-port memory design which can perform either dual-read or a single-write operation (2R/1W) by efficiently combining the 6 Transistor (6T) sin...
The topology and channel width in Network-on-Chips (NoC) impacts the throughput and latency and therefore the area of deployment. In this paper an NoC based on a three dimensional, toroidal rectangular honeycomb topology using a two tupled (x, y) address, is discussed. It employs a minimal and deterministic routing algorithm utilizing Virtual Chann...
Embedded memories are the key contributor to the chip area, dynamic power dissipation and also form a significant part of critical path for high performance advanced SoCs. Therefore, optimal selection of memory instances becomes imperative for SoC designers. While EDA tools have evolved over the past years to optimally select standard logic cells d...
Snow petrels (Pagodroma nivea) are birds breeding in crevices on the Antarctic Peninsula. We develop a system to monitor the nest temperatures and times birds spend outside their nests. To estimate this time, the system has been equipped with a simple Light Dependent Resistor (LDR) to overcome the insensitivity and false positives generated by Pass...
Coarse-Grained ReconfigurableArchitectures (CGRA) are proven to be advantageous over fine-grained architectures, massively parallel GPUs and generic CPUs, in terms of energy and flexibility. However the key challenge of programmability is preventing wide-spread adoption. To exploit instruction level parallelism inherent to such architectures, optim...
In this thesis a Network-on-Chip (NoC) router implementation called RECONNECT realized in Bluespec System Verilog (BSV), is presented. It is highly configurable in terms of flit size, the number of provided Input Port/Output Port (IP/OP) pairs and support for configurations during runtime, to name a few. Depending on the amount of available IP/OP p...
REDEFINE is a runtime reconfigurable hardware platform. In this paper, we trace the development of a runtime reconfigurable hardware from a general purpose processor, by eliminating certain characteristics such as: Register files and Bypass network. We instead allow explicit write backs to the reservation stations as in Transport Triggered Architec...
RECONNECT is a network-on-chip using a honeycomb topology. In this paper we focus on properties of general rules applicable to a variety of routing algorithms for the NoC which take into account the missing links of the honeycomb topology when compared to a mesh. We also extend the original proposal and show a method to insert and extract data to a...
Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilizat...
In this paper we develop compilation techniques for the realization of applications described in a High Level Language (HLL)
onto a Runtime Reconfigurable Architecture. The compiler determines Hyper Operations (HyperOps) that are subgraphs of a data
flow graph (of an application) and comprise elementary operations that have strong producer-consumer...
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip...
A polymorphic ASIC is a runtime reconfigurable hardware substrate comprising compute and communication elements. It is a ldquofuture proofrdquo custom hardware solution for multiple applications and their derivatives in a domain. Interoperability between application derivatives at runtime is achieved through hardware reconfiguration. In this paper...
Application accelerators are predominantly ASICs. The cost of ASIC solutions are order of magnitudes higher than programmable processing cores. Despite this, ASIC solutions are preferred when both high performance and low power is the target. ASICs offer no flexibility in terms of it being able to cater to application derivatives, unless this has b...
This paper describes an idea, how result lists of search engines could be improved and adapted to user profiles automatically. To achieve this goal, two different algorithms have been implemented in an application called PISA. One is a probabilistic method of classification known as Bayesian algorithm and the other is TF-IDF (Term Frequency - Inver...
Emerging embedded applications are based on evolving standards (ex. MPEG2/4, H.264/265, IEEE802.11a/b/g). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization,...