Nikil Dutt

University of California, Irvine, Irvine, California, United States

Are you Nikil Dutt?

Claim your profile

Publications (525)85.18 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: With memories continuing to dominate the area, power, cost and performance of a design, there is a critical need to provision reliable, high-performance memory bandwidth for emerging applications. Memories are susceptible to degradation and failures from a wide range of manufacturing, operational and environmental effects, requiring a multi-layer hardware/software approach that can tolerate, adapt and even opportunistically exploit such effects. The overall memory hierarchy is also highly vulnerable to the adverse effects of variability and operational stress. After reviewing the major memory degradation and failure modes, this paper describes the challenges for dependability across the memory hierarchy, and outlines research efforts to achieve multi-layer memory resilience using a hardware/software approach. Two specific exemplars are used to illustrate multi- layer memory resilience: first we describe static and dynamic policies to achieve energy savings in caches using aggressive voltage scaling combined with disabling faulty blocks; and second we show how software characteristics can be exposed to the architecture in order to mitigate the aging of large register files in GPGPUs. These approaches can further benefit from semantic retention of application intent to enhance memory dependability across multiple abstraction levels, including applications, compilers, run-time systems, and hardware platforms.
    Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14; 06/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Advances in technology scaling increasingly make emerging Chip MultiProcessor (CMP) platforms more susceptible to failures that cause various reliability challenges. In such platforms, error-prone on-chip memories (caches) continue to dominate the chip area. Also, Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these architectures. We present a novel solution for efficient implementation of fault-tolerant design of Last-Level Cache (LLC) in CMP architectures. The proposed approach leverages the interconnection network fabric to protect the LLC cache banks against permanent faults in an efficient and scalable way. During an LLC access to a faulty block, the network detects and corrects the faults, returning the fault-free data to the requesting core. Leveraging the NoC interconnection fabric, designers can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner for emerging multicore/manycore platforms. We propose four different policies for implementing a remapping-based fault-tolerant scheme leveraging the NoC fabric in different settings. The proposed policies enable design trade-offs between NoC traffic (packets sent through the network) and the intrinsic parallelism of these communication mechanisms, allowing designers to tune the system based on design constraints. We perform an extensive design space exploration on NoC benchmarks to demonstrate the usability and efficacy of our approach. In addition, we perform sensitivity analysis to observe the behavior of various policies in reaction to improvements in the NoC architecture. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.
    ACM Transactions on Embedded Computing Systems (TECS). 03/2014; 13(3s).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Simulating large-scale models of biological motion perception is challenging, due to the required memory to store the network structure and the computational power needed to quickly solve the neuronal dynamics. A low-cost yet high-performance approach to simulating large-scale neural network models in real-time is to leverage the parallel processing capability of graphics processing units (GPUs). Based on this approach, we present a two-stage model of visual area MT that we believe to be the first large-scale spiking network to demonstrate pattern direction selectivity. In this model, component-direction-selective (CDS) cells in MT linearly combine inputs from V1 cells that have spatiotemporal receptive fields according to the motion energy model of Simoncelli and Heeger. Pattern-direction-selective (PDS) cells in MT are constructed by pooling over MT CDS cells with a wide range of preferred directions. Responses of our model neurons are comparable to electrophysiological results for grating and plaid stimuli as well as speed tuning. The behavioral response of the network in a motion discrimination task is in agreement with psychophysical data. Moreover, our implementation outperforms a previous implementation of the motion energy model by orders of magnitude in terms of computational speed and memory usage. The full network, which comprises 153,216 neurons and approximately 40 million synapses, processes 20 frames per second of a 40 × 40 input video in real-time using a single off-the-shelf GPU. To promote the use of this algorithm among neuroscientists and computer vision researchers, the source code for the simulator, the network, and analysis scripts are publicly available.
    Neuroinformatics 02/2014; · 3.14 Impact Factor
  • Luis Angel D. Bathen, Nikil D. Dutt
    [Show abstract] [Hide abstract]
    ABSTRACT: The dual effects of larger die sizes and technology scaling, combined with aggressive voltage scaling for power reduction, increase the error rates for on-chip memories. Traditional on-chip memory reliability techniques (e.g., ECC) incur significant power and performance overheads. In this article, we propose a low-power-and-performance-overhead Embedded RAID (E-RAID) strategy and present Embedded RAIDs-on-Chip (E-RoC), a distributed dynamically managed reliable memory subsystem for bus-based Chip-Multiprocessors. E-RoC achieves reliability through redundancy by optimizing RAID-like policies tuned for on-chip distributed memories. We achieve on-chip reliability of memories through the use of Distributed Dynamic ScratchPad Allocatable Memories (DSPAMs) and their allocation policies. We exploit aggressive voltage scaling to reduce power consumption overheads due to parallel DSPAM accesses, and rely on the E-RoC Manager to automatically handle any resulting voltage-scaling-induced errors. We demonstrate how E-RAIDs can further enhance the fault tolerance of traditional memory reliability approaches by designing E-RAID levels that exploit ECC. Finally, we show the power and flexibility of the E-RoC concept by showing the benefits of having a heterogeneous E-RAID levels that fit each application's needs (fault tolerance, power/energy, performance). Our experimental results on CHStone/Mediabench II benchmarks show that our E-RAID levels converge to 100% error-free data rates much faster than traditional ECC approaches. Moreover, E-RAID levels that exploit ECC can guarantee 99.9% error-free data rates at ultra low Vdd on average, where as traditional ECC approaches were able to attain at most 99.1% error-free data rates. We observe an average of 22% dynamic power consumption increase by using traditional ECC approaches with respect to the baseline (non-voltage scaled SPMs), whereas our E-RAID levels are able to save dynamic power consumption by an average of 27% (w.r.t. the same non-voltage scaled SPMs baseline), while incurring worst-case 2% higher performance overheads than traditional ECC approaches. By voltage scaling the memories, we see that traditional ECC approaches are able to save static energy by 6.4% (average), where as our E-RAID approaches achieve 23.4% static energy savings (average). Finally, we observe that mixing E-RAID levels allows us to further reduce the dynamic power consumption by up to 55.5% at the cost of an average 5.6% increase in execution time over traditional approaches.
    ACM Transactions on Embedded Computing Systems (TECS). 02/2014; 13(4).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hardware variability is predicted to increase dramatically over the coming years as a consequence of continued technology scaling. In this paper, we apply the Underdesigned and Opportunistic Computing (UnO) paradigm by exposing system-level power variability to software to improve energy efficiency.We present ViPZonE, a memory management solution in conjunction with application annotations that opportunistically performs memory allocations to reduce DRAM energy. ViPZonE’s components consist of a physical address space with DIMM-aware zones, a modified page allocation routine, and a new virtual memory system call for dynamic allocations from userspace. We implemented ViPZonE in the Linux kernel with GLIBC API support, running on a real x86-64 testbed with significant access power variation in its DDR3 DIMMs. We demonstrate that on our testbed, ViPZonE can save up to 27.80% memory energy, with no more than 4.80% performance degradation across a set of PARSEC benchmarks tested with respect to the baseline Linux software. Furthermore, through a hypothetical “what-if” extension, we predict that in future non-volatile memory systems which consume almost no idle power, ViPZonE could yield even greater benefits, demonstrating the ability to exploit memory hardware variability through opportunistic software.
    IEEE Transactions on Computers. 01/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As the desire for biologically realistic spiking neural networks (SNNs) increases, tuning the enormous number of open parameters in these models becomes a difficult challenge. SNNs have been used to successfully model complex neural circuits that explore various neural phenomena such as neural plasticity, vision systems, auditory systems, neural oscillations, and many other important topics of neural function. Additionally, SNNs are particularly well-adapted to run on neuromorphic hardware that will support biological brain-scale architectures. Although the inclusion of realistic plasticity equations, neural dynamics, and recurrent topologies has increased the descriptive power of SNNs, it has also made the task of tuning these biologically realistic SNNs difficult. To meet this challenge, we present an automated parameter tuning framework capable of tuning SNNs quickly and efficiently using evolutionary algorithms (EA) and inexpensive, readily accessible graphics processing units (GPUs). A sample SNN with 4104 neurons was tuned to give V1 simple cell-like tuning curve responses and produce self-organizing receptive fields (SORFs) when presented with a random sequence of counterphase sinusoidal grating stimuli. A performance analysis comparing the GPU-accelerated implementation to a single-threaded central processing unit (CPU) implementation was carried out and showed a speedup of 65× of the GPU implementation over the CPU implementation, or 0.35 h per generation for GPU vs. 23.5 h per generation for CPU. Additionally, the parameter value solutions found in the tuned SNN were studied and found to be stable and repeatable. The automated parameter tuning framework presented here will be of use to both the computational neuroscience and neuromorphic engineering communities, making the process of constructing and tuning large-scale SNNs much quicker and easier.
    Frontiers in Neuroscience 01/2014; 8:10.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Complicated approaches to fault-tolerant voltage-scalable (FTVS) SRAMcache architectures can suffer fromhigh over- heads. We propose static (SPCS) and dynamic (DPCS) vari- ants of power/capacity scaling, a simple and low-overhead fault-tolerant cache architecture that utilizes insights gained from our 45nm SOI test chip. Our mechanism combines multi-level voltage scaling with power gating of blocks that become faulty at each voltage level. The SPCS policy sets the runtime cache VDD statically such that almost all of the cache blocks are not faulty. The DPCS policy opportunis- tically reduces the voltage further to save more power than SPCS while limiting the impact on performance caused by additional faulty blocks. Through an analytical evaluation, we show that our approach can achieve lower static power for all effective cache capacities than a recent complex FTVS work. This is due to significantly lower overheads, despite the failure of our approach to match the min-VDD of the competing work at fixed yield. Through architectural sim- ulations, we find that the average energy saved by SPCS is 55%, while DPCS saves an average of 69% of energy with respect to baseline caches at 1 V. Our approach incurs no more than 4% performance and 5% area penalties in the worst case cache configuration
    Proceedings of the Design Automation Conference (DAC); 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Both attentional signals from frontal cortex and neuromodulatory signals from basal forebrain (BF) have been shown to influence information processing in the primary visual cortex (V1). These two systems exert complementary effects on their targets, including increasing firing rates and decreasing interneuronal correlations. Interestingly, experimental research suggests that the cholinergic system is important for increasing V1's sensitivity to both sensory and attentional information. To see how the BF and top-down attention act together to modulate sensory input, we developed a spiking neural network model of V1 and thalamus that incorporated cholinergic neuromodulation and top-down attention. In our model, activation of the BF had a broad effect that decreases the efficacy of top-down projections and increased the reliance of bottom-up sensory input. In contrast, we demonstrated how local release of acetylcholine in the visual cortex, which was triggered through top-down gluatmatergic projections, could enhance top-down attention with high spatial specificity. Our model matched experimental data showing that the BF and top-down attention decrease interneuronal correlations and increase between-trial reliability. We found that decreases in correlations were primarily between excitatory-inhibitory pairs rather than excitatory-excitatory pairs and suggest that excitatory-inhibitory decorrelation is necessary for maintaining low levels of excitatory-excitatory correlations. Increased inhibitory drive via release of acetylcholine in V1 may then act as a buffer, absorbing increases in excitatory-excitatory correlations that occur with attention and BF stimulation. These findings will lead to a better understanding of the mechanisms underyling the BF's interactions with attention signals and influences on correlations.
    European Journal of Neuroscience 12/2013; · 3.75 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Understanding how the human brain is able to efficiently perceive and understand a visual scene is still a field of ongoing research. Although many studies have focused on the design and optimization of neural networks to solve visual recognition tasks, most of them either lack neurobiologically plausible learning rules or decision-making processes. Here we present a large-scale model of a hierarchical spiking neural network (SNN) that integrates a low-level memory encoding mechanism with a higher-level decision process to perform a visual classification task in real-time. The model consists of Izhikevich neurons and conductance-based synapses for realistic approximation of neuronal dynamics, a spike-timing-dependent plasticity (STDP) synaptic learning rule with additional synaptic dynamics for memory encoding, and an accumulator model for memory retrieval and categorization. The full network, which comprised 71,026 neurons and approximately 133 million synapses, ran in real-time on a single off-the-shelf graphics processing unit (GPU). The network was constructed on a publicly available SNN simulator that supports general-purpose neuromorphic computer chips. The network achieved 92% correct classifications on MNIST in 100 rounds of random sub-sampling, which is comparable to other SNN approaches and provides a conservative and reliable performance metric. Additionally, the model correctly predicted reaction times from psychophysical experiments. Because of the scalability of the approach and its neurobiological fidelity, the current model can be extended to an efficient neuromorphic implementation that supports more generalized object recognition and decision-making architectures found in the brain.
    Neural networks: the official journal of the International Neural Network Society 08/2013; 48C:109-124. · 1.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reliability concerns due to technology scaling have been a major focus of researchers and designers for several technology nodes. Therefore, many new techniques for enhancing and optimizing reliability have emerged particularly within the last five to ten years. This perspective paper introduces the most prominent reliability concerns from today's points of view and roughly recapitulates the progress in the community so far. The focus of this paper is on perspective trends from the industrial as well as academic points of view that suggest a way for coping with reliability challenges in upcoming technology nodes.
    Design Automation Conference (DAC), 2013 50th ACM / EDAC / IEEE, Austin, TX, USA; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The increasing demand for low-power and high-performance multimedia embedded systems has motivated the need for effective solutions to satisfy application bandwidth and latency requirements under a tight power budget. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose MultiMaKe, an application mapping design flow capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels, early execution edges to drive performance, and kernel pipelining to increase system throughput. Our experimental results on JPEG and JPEG2000 show up to 97% off-chip memory access reduction, and up to 80% execution time reduction over standard mapping and task-level pipelining approaches.
    ACM Transactions on Embedded Computing Systems (TECS). 03/2013; 12(1s).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spiking neural network (SNN) simulations with spike-timing dependent plasticity (STDP) often experience runaway synaptic dynamics and require some sort of regulatory mechanism to stay within a stable operating regime. Previous homeostatic models have used L1 or L2 normalization to scale the synaptic weights but the biophysical mechanisms underlying these processes remain undiscovered. We propose a model for homeostatic synaptic scaling that modifies synaptic weights in a multiplicative manner based on the average postsynaptic firing rate as observed in experiments. The homeostatic mechanism was implemented with STDP in conductance-based SNNs with Izhikevich-type neurons. In the first set of simulations, homeostatic synaptic scaling stabilized weight changes in STDP and prevented runaway dynamics in simple SNNs. During the second set of simulations, homeostatic synaptic scaling was found to be necessary for the unsupervised learning of V1 simple cell receptive fields in response to patterned inputs. STDP, in combination with homeostatic synaptic scaling, was shown to be mathematically equivalent to non-negative matrix factorization (NNMF) and the stability of the homeostatic update rule was proven. The homeostatic model presented here is novel, biologically plausible, and capable of unsupervised learning of patterned inputs, which has been a significant challenge for SNNs with STDP.
    Neural Networks (IJCNN), The 2013 International Joint Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale spiking neural networks (SNNs) have been used to successfully model complex neural circuits that explore various neural phenomena such as learning and memory, vision systems, auditory systems, neural oscillations, and many other important topics of neural function. Additionally, SNNs are particularly well-adapted to run on neuromorphic hardware as spiking events are often sparse, leading to a potentially large reduction in both bandwidth requirements and power usage. The inclusion of realistic plasticity equations, neural dynamics, and recurrent topologies has increased the descriptive power of SNNs but has also made the task of tuning these biologically realistic SNNs difficult. We present an automated parameter-tuning framework capable of tuning large-scale SNNs quickly and efficiently using evolutionary algorithms (EA) and off-the-shelf graphics processing units (GPUs).
    Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2013 International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The dorsolateral prefrontal cortex (dlPFC), which is regarded as the primary site for visuospatial working memory in the brain, is significantly modulated by dopamine (DA) and norepinephrine (NE). DA and NE originate in the ventral tegmental area (VTA) and locus coeruleus (LC), respectively, and have been shown to have an "inverted-U" dose-response profile in dlPFC, where the level of arousal and decision-making performance is a function of DA and NE concentrations. Moreover, there appears to be a sweet spot, in terms of the level of DA and NE activation, which allows for optimal working memory and behavioral performance. When either DA or NE is too high, input to the PFC is essentially blocked. When either DA or NE is too low, PFC network dynamics become noisy and activity levels diminish. Mechanisms for how this is occurring have been suggested, however, they have not been tested in a large-scale model with neurobiologically plausible network dynamics. Also, DA and NE levels have not been simultaneously manipulated experimentally, which is not realistic in vivo due to strong bi-directional connections between the VTA and LC. To address these issues, we built a spiking neural network model that includes D1, α2A, and α1 receptors. The model was able to match the inverted-U profiles that have been shown experimentally for differing levels of DA and NE. Furthermore, we were able to make predictions about what working memory and behavioral deficits may occur during simultaneous manipulation of DA and NE outside of their optimal levels. Specifically, when DA levels were low and NE levels were high, cues could not be held in working memory due to increased noise. On the other hand, when DA levels were high and NE levels were low, incorrect decisions were made due to weak overall network activity. We also show that lateral inhibition in working memory may play a more important role in increasing signal-to-noise ratio than increasing recurrent excitatory input.
    Frontiers in Computational Neuroscience 01/2013; 7:133. · 2.48 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Microelectronic circuits exhibit increasing variations in performance, power consumption, and reliability parameters across the manufactured parts and across use of these parts over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure yield and reliability with respect to a rigid set of datasheet specifications. This paper explores the possibility of constructing computing machines that purposely expose hardware variations to various layers of the system stack including software. This leads to the vision of underdesigned hardware that utilizes a software stack that opportunistically adapts to a sensed or modeled hardware. The envisioned underdesigned and opportunistic computing (UnO) machines face a number of challenges related to the sensing infrastructure and software interfaces that can effectively utilize the sensory data. In this paper, we outline specific sensing mechanisms that we have developed and their potential use in building UnO machines.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 01/2013; 32(1):8-23. · 1.09 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: As we enter the deep submicron era, transistors are increasingly added to chips, causing the chips to become hotter in a non-uniform manner. This is due to different processing tasks in different parts of the chips. This thermal gradient also causes a great number of problems such as the reduction in reliability of chips and interconnects due to electromigration, and system performance degradation because of increased delay and lowered clock frequencies. Since these thermal issues exist, interconnect routing, especially global routing, should be performed to consider the temperature distribution of substrates and the actual delay of interconnects. In this paper, we propose a global routing method based on image processing and computer vision techniques in which the probability of chip failure due to interconnect failure is reduced, and performance degradation from increased delay is also prevented. We observed that our method reduced the number of grids in hot regions by up to 50 % when compared with a conventional router, while maintaining the delay of interconnects as small as possible.
    Quality Electronic Design (ISQED), 2013 14th International Symposium on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: State-of-the-art general-purpose graphic processing units (GPGPUs) implemented in nanoscale CMOS technologies offer very high computational throughput for highly-parallel applications using hundreds of integrated on-chip resources. These resources are stressed during application execution, subjecting them to degradation mechanisms such as negative bias temperature instability (NBTI) that adversely affect their reliability. To support highly parallel execution, GPGPUs contain large register files (RFs) that are among the most highly stressed GPGPU components; however we observe heavy underutilization of RFs (on average only 46%) for typical general-purpose kernels. We present ARGO, an Aging-awaRe GPGPU RF allOcator that opportunistically exploits this RF underutilization by distributing the stress throughout RF. ARGO achieves proper leveling of RF banks through deliberated power-gating of stressful banks. We demonstrate our technique on the AMD Evergreen GPGPU architecture and show that ARGO improves the NBTI-induced threshold voltage degradation by up to 43% (on average 27%), that yields improving RFs static noise margin up to 46% (on average 30%). Furthermore, we estimate a simultaneous reduction in leakage power of 54% by providing sleep states for unused banks.
    Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2013 International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Technology scaling and process variation severely degrade the reliability of Chip Multiprocessors (CMPs), especially their large cache blocks. To improve cache reliability, we propose REMEDIATE, a scalable fault-tolerant architecture for low-power design of shared Non-Uniform Cache Access (NUCA) cache in Tiled CMPs. REMEDIATE achieves fault-tolerance through redundancy from multiple banks to maximize the amount of fault remapping, and minimize the amount of capacity lost in the cache when the failure rate is high. REMEDIATE leverages a scalable fault protection technique using two different remapping heuristics in a distributed shared cache architecture with non-uniform latencies. We deploy a graph coloring algorithm to optimize REMEDIATE's remapping configuration. We perform an extensive design space exploration of operating voltage, performance, and power that enables designers to select different operating points and evaluate their design efficacy. Experimental results on a 4×4 tiled CMP system voltage scaled to below 400mV show that REMEDIATE saves up to 50% power while recovering more than 80% of the faulty cache area with only modest performance degradation.
    Green Computing Conference (IGCC), 2013 International; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a global research collaboration project performed by a Korean-USA research group. The project aims at designing and implementing a run-time platform for reliable, safe, and secure cyber physical systems (CPS). The project consists of layered sub-projects including SoC design for reliable systems, virtualized software architecture for dynamically up gradable systems, and middleware architecture for safety critical networked applications. This paper describes the objectives of each sub-project and the current accomplishments.
    Service-Oriented Computing and Applications (SOCA), 2013 IEEE 6th International Conference on; 01/2013
  • H. Tajik, H. Homayoun, N. Dutt
    [Show abstract] [Hide abstract]
    ABSTRACT: Three dimensional (3D) integration attempts to address challenges and limitations of new technologies such as interconnect delay and power consumption. However, high power density and increased temperature in 3D architectures accelerate wearout failure mechanisms such as Negative Bias Temperature Instability (NBTI). In this paper we present VAWOM (Variation Aware WearOut Management), an approach that reduces the NBTI effect by exploiting temperature and process variation in 3D architectures. We demonstrate the efficacy of VAWOM on a two-layer 3D architecture with 4x4 cores on the first layer and 4x4 last level caches on the second layer, and show that VAWOM reduces NBTI induced threshold voltage degradation by 30% with only a small degradation in performance.
    Design Automation Conference (DAC), 2013 50th ACM / EDAC / IEEE; 01/2013

Publication Stats

7k Citations
85.18 Total Impact Points

Institutions

  • 1989–2013
    • University of California, Irvine
      • • Department of Computer Science
      • • Department of Cognitive Sciences
      • • Center for Embedded Computer Systems (CECS)
      Irvine, California, United States
  • 2011
    • Samsung Advanced Institute of Technology
      Usan-ri, Gyeonggi Province, South Korea
  • 2010
    • Colorado State University
      • Department of Electrical & Computer Engineering
      Fort Collins, Colorado, United States
  • 2002–2010
    • CSU Mentor
      Long Beach, California, United States
  • 2009
    • Universidade Federal do Rio Grande do Sul
      Pôrto de São Francisco dos Casaes, Rio Grande do Sul, Brazil
  • 2006–2009
    • Arizona State University
      • School of Computing, Informatics, and Decision Systems Engineering
      Phoenix, Arizona, United States
  • 2005–2009
    • University of Florida
      • • Department of Electrical and Computer Engineering
      • • Department of Computer and Information Science and Engineering
      Gainesville, FL, United States
  • 2003–2009
    • Seoul National University
      • School of Computer Science and Engineering
      Seoul, Seoul, South Korea
    • University of California, San Diego
      San Diego, California, United States
    • Nagoya University
      Nagoya, Aichi, Japan
  • 2007
    • The MathWorks, Inc
      Natick, Massachusetts, United States
    • Fujitsu Ltd.
      Kawasaki Si, Kanagawa, Japan
  • 2004–2005
    • University of California, Riverside
      • Department of Computer Science and Engineering
      Riverside, CA, United States
  • 2001
    • IIT Kharagpur
      • Department of Computer Science & Engineering
      Kharagpur, Bengal, India
  • 1998–1999
    • Synopsys
      Mountain View, California, United States
  • 1997
    • Seoul Women's University
      • Department of Computer Science and Engineering
      Seoul, Seoul, South Korea
  • 1992–1994
    • University of Padova
      Padua, Veneto, Italy