Euiseong Seo’s research while affiliated with Sungkyunkwan University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (66)


Cloud Reamer: Enabling Inference Services in Training Clusters
  • Conference Paper

October 2024

·

17 Reads

Osama Khan

·

Gwanjong Park

·

Junyeol Yu

·

Euiseong Seo





DaCapo: An On-Device Learning Scheme for Memory-Constrained Embedded Systems

September 2023

·

28 Reads

·

3 Citations

ACM Transactions on Embedded Computing Systems

The use of deep neural network (DNN) applications in microcontroller unit (MCU) embedded systems is getting popular. However, the DNN models in such systems frequently suffer from accuracy loss due to the dataset shift problem. On-device learning resolves this problem by updating the model parameters on-site with the real-world data, thus localizing the model to its surroundings. However, the backpropagation step during on-device learning requires the output of every layer computed during the forward pass to be stored in memory. This is usually infeasible in MCU devices as they are equipped only with a few KBs of SRAM. Given their energy limitation and the timeliness requirements, using flash memory to store the output of every layer is not practical either. Although there have been proposed a few research results to enable on-device learning under stringent memory conditions, they require the modification of the target models or the use of non-conventional gradient computation strategies. This paper proposes DaCapo, a backpropagation scheme that enables on-device learning in memory-constrained embedded systems. DaCapo stores only the output of certain layers, known as checkpoints, in SRAM, and discards the others. The discarded outputs are recomputed during backpropagation from the nearest checkpoint in front of them. In order to minimize the recomputation occurrences, DaCapo optimally plans the checkpoints to be stored in the SRAM area at a particular phase of the backpropagation and thus replaces the checkpoints stored in memory as the backpropagation progresses. We implemented the proposed scheme in an STM32F429ZI board and evaluated it with five representative DNN models. Our evaluation showed that DaCapo improved backpropagation time by up to 22% and saved energy consumption by up to 28% in comparison to AIfES, a machine learning platform optimized for MCU devices. In addition, our proposed approach enabled the training of MobileNet, which the MCU device had been previously unable to train.






Citations (45)


... However, implementing high-accuracy but resource-intensive ML models like convolutional neural networks (CNNs) poses challenges due to limited computational and memory resources [23]. Recent research addresses these challenges through two main aspects: optimizing ML models for resource efficiency and enhancing energy management [8,9,11,15,18]. ...

Reference:

E-QUARTIC: Energy Efficient Edge Ensemble of Convolutional Neural Networks for Resource-Optimized Learning
Energy-Harvesting-Aware Adaptive Inference of Deep Neural Networks in Embedded Systems
  • Citing Conference Paper
  • August 2023

... We observed such drift and will address this challenge in future work. Power/Energy Characterization and Optimization: Researchers used the tools and methods above to characterize the energy efficiency of critical workloads and primitives in AI and HPC running on different scales [38]- [44], and to study the efficiency of the latest innovations in GPUs and other accelerators [45], [46]. Prior work also investigated the impact of frequency capping, power capping, DVFS, and input data composition on energy efficiency [24], [47]- [51]. ...

Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving
  • Citing Conference Paper
  • February 2023

... Leveraging heterogeneous GPUs is challenging due to their varying hardware properties (e.g., memory size, computation capability). To split data, prior work has looked at scaling the batch size in data parallelism for each GPU [17,24]. To split the model, prior work has explored unevenly distributing model layers in pipeline parallelism across GPUs [15,42,45]. ...

Scale-Train: A Scalable DNN Training Framework for a Heterogeneous GPU Cloud
  • Citing Article
  • Full-text available
  • January 2022

IEEE Access

... Traditionally, in High-Performance Computing (HPC), CNN inference is performed solely on GPUs. Therefore, multiple works only focus on power efficiency for inference on GPUs, such as [18] and [22]. PELSI, on the other hand, focuses on HMPSoCs where embedded CPUs and GPUs are comparable in performance, and both are used for inference to maximize efficiency. ...

A DNN Inference Latency-aware GPU Power Management Scheme
  • Citing Conference Paper
  • October 2021

... 7. Disable Non-Uniform Memory Access (NUMA) Memory Balancing Automatic migration of memory pages between NUMA nodes would cause memory page faults and latency spikes similar to that of disk-based swapping, for application tasks that used the affected memory. We therefore disabled this type of memory balancing feature [57,58]. 8. Disable Mitigations for CPU Vulnerabilities Mitigations against CPU vulnerabilities such as Spectre, Retbleed and Downfall to name a few, should be turned off, as those mitigations can have an unknown negative effect on certain types of application tasks. ...

A Performance-Stable NUMA Management Scheme for Linux-based HPC Systems

IEEE Access

Jaehyun Song

·

Minwoo Ahn

·

Gyusun Lee

·

[...]

·

... Environmental adaptation and customization are critical factors in developing robust fitness tracking systems that can generalize across diverse workout scenarios. The proposed system employs techniques such as transfer learning [12], domain adaptation [13], and user profiling [14] to enable models to adapt to varying lighting conditions, backgrounds, and user preferences. These techniques have been successfully applied in computer vision tasks [15], allowing models to leverage knowledge from related domains and user-specific data to improve performance in new scenarios. ...

A Dynamic Scaling Scheme of Cloud-based DNN Training Clusters
  • Citing Conference Paper
  • November 2020

... Idempotent GPU kernels promise to produce the same output upon re-execution, without causing any side effects, no matter how many times it is interrupted at any point. Such idempotence property has been shown to be effective in overcoming the performance bottlenecks of many aspects of GPU systems, including fault tolerance [37,76], task preemption [22,36,54], and memory persistence [44,75]. For instance, prior work shows that idempotence-based optimizations can significantly reduce the checkpointing overheads of fault-tolerant systems to less than 1% [37]. ...

Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems
  • Citing Article
  • April 2020

IEEE Transactions on Computers

... To fulfill user expectations of seamless and rapid application relaunch, mobile systems preserve all execution-related data (called anonymous data in Linux [4]), such as stack and heap, in main memory. This practice, known as keeping applications alive in the background [1,[5][6][7][8], enables faster relaunches. However, it also results in significant main memory capacity requirements for each application. ...

ezswap: Enhanced Compressed Swap Scheme for Mobile Devices

IEEE Access

... ETC incurs an inter-phase cost both before and after adjusting the TLP, as it requires a period to detect changes in locality loss and reach the optimal TLP during runtime. Consequently, the best thread throttling strategy chosen statically for each phase outperforms dynamic thread throttling, as the static scheme does not incur inter-phase cost [16,17]. Several other studies have also proposed page prefetch and eviction policies to reduce the overhead caused by UVM oversubscription [1-4, 8-10, 18-20]. ...

Compiler-Assisted GPU Thread Throttling for Reduced Cache Contention
  • Citing Conference Paper
  • August 2019