Changha Lee’s research while affiliated with Korea Advanced Institute of Science and Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Rethinking Peculiar Images by Diffusion Models: Revealing Local Minima’s Role
  • Article

March 2024

·

11 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Jinhyeok Jang

·

Chan-Hyun Youn

·

Minsu Jeon

·

Changha Lee

Recent significant advancements in diffusion models have revolutionized image generation, enabling the synthesis of highly realistic images with text-based guidance. These breakthroughs have paved the way for constructing datasets via generative artificial intelligence (AI), offering immense potential for various applications. However, two critical challenges hinder the widespread adoption of synthesized data: computational cost and the generation of peculiar images. While computational costs have improved through various approaches, the issue of peculiar image generation remains relatively unexplored. Existing solutions rely on heuristics, extra training, or AI-based post-processing to mitigate this problem. In this paper, we present a novel approach to address both issues simultaneously. We establish that both gradient descent and diffusion sampling are specific cases of the generalized expectation maximization algorithm. We hypothesize and empirically demonstrate that peculiar image generation is akin to the local minima problem in optimization. Inspired by optimization techniques, we apply naive momentum and positive-negative momentum to diffusion sampling. Last, we propose new metrics to evaluate the peculiarity. Experimental results show momentum effectively prevents peculiar image generation without extra computation.


EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning

March 2024

·

9 Reads

IEEE Journal of Solid-State Circuits

Junsoo Kim

·

Seunghee Han

·

Geonwoo Ko

·

[...]

·

Joo-Young Kim

Deep neural networks (DNNs) have recently gained significant prominence in various real-world applications such as image recognition, natural language processing, and autonomous vehicles. However, due to their black-box nature in system, the underlying mechanisms of DNNs behind the inference results remain opaque to users. In order to address this challenge, researchers have focused on developing explainable artificial intelligence (AI) algorithms. Explainable AI aims to provide a clear and human-understandable explanation of the model’s decision, thereby building more reliable systems. However, the explanation task differs from well-known inference and training processes as it involves interactions with the user. Consequently, existing inference and training accelerators face inefficiencies when processing explainable AI on edge devices. This article introduces explainable processing unit (EPU), the first hardware accelerator designed for explainable AI workloads. The EPU utilizes a novel data compression format for the output heat maps and intermediate gradients to enhance the overall system performance by reducing both memory footprint and external memory access. Its sparsity-free computing core efficiently handles the input sparsity with negligible control overhead, resulting in a throughput boost of up to 9.48×. It also proposes a dynamic workload scheduling with a customized on-chip network for distinct inference and explanation tasks to maximize internal data reuse hence reducing external memory access by 63.7%. Furthermore, the EPU incorporates point-wise gradient pruning (PGP) that can significantly reduce the size of heat maps by a factor of 7.01× combined with the proposed compression format. Finally, the EPU chip fabricated in a 28 nm CMOS process achieves a remarkable heat map generation rate of 367 frames/s for ResNet-34 while maintaining the state-of-the-art area and energy efficiency of 112.3 GOPS/mm 2^2 and 26.55 TOPS/W, respectively.


SLO-Aware DL Job Scheduling for Efficient FPGA-GPU Edge Cloud Computing

January 2024

·

3 Reads

Communications in Computer and Information Science

Deep learning applications have become increasingly popular in recent years, leading to the development of specialized hardware accelerators such as FPGAs and GPUs. These accelerators provide significant performance gains over traditional CPUs, but their efficient utilization requires careful scheduling configuration for given DL requests. In this paper, we propose a SLO-aware DL job scheduling model for efficient FPGA-GPU edge cloud computing. The proposed model takes into account variant service-level objectives of the DL job and periodically updates the accelerator configuration of DL processing while minimizing computation costs accordingly. We first analyze the impact of various DL-related parameters on the performance of FPGA-GPU computing. We then propose a novel scheduling algorithm that considers the time-variant latency SLO constraints and periodically updates the scheduling configuration. We evaluated our scheduler using several DL workloads on a FPGA-GPU cluster. Our results demonstrated that our scheduler achieves improvements in terms of both energy consumption and SLO compliance compared to the traditional DL scheduling approach.


DELCAS: Deep Reinforcement Learning Based GPU CaaS Packet Scheduling for Stabilizing QoE in 5G Multi-Access Edge Computing

January 2024

·

8 Reads

Communications in Computer and Information Science

Recently, Docker Container as a Service (CaaS) has been provided for multi-user services in the 5G Multi-Access Edge Computing (MEC) environment, and servers that support accelerators such as GPUs, not conventional CPU servers, are being considered. In addition, as the number of AI services is increasing and the computation power required by deep neural network model increases, offloading to edge servers is required due to insufficient computational capacity and heat problem of user devices (UE). However, there is a resource scheduling problem because all users’ packets cannot be offloaded to the edge server due to resource limitations. To address this problem, we suggest deep reinforcement learning-based GPU CaaS Packet scheduling named as Delcas for stabilizing quality of AI experience. First, we design the architecture using containerized target AI application on MEC GPUs and multiple users send video stream to MEC server. We evaluate video stream to identify the dynamic amount of resource requirement among each users using optical flow and adjust user task queue. To satisfy equal latency quality of experience, we apply lower quality first serve approach and respond hand pose estimation results to each user. Finally, we evaluate our approach and compare to conventional scheduling method in the aspect of both accuracy and latency quality.




FIGURE 2. Comparison results of visual explanation on each residual block in ResNet-18 for the satellite imagery.
FIGURE 5. Overall procedure of mediating explanations of multiple episodes based on DropAtt with Multi-disciplinary debate.
FIGURE 6. Qualitative comparison on visual explanation with regards to the way attention is used.
Explainability comparison of layer-wise explanation among different attention policies.
Performance comparison of visual explainability with regard to the applying methods (average drop: lower is better, rate of increase in confidence: higher is better).

+1

Recursive Visual Explanations Mediation Scheme Based on DropAttention Model With Multiple Episodes Pool
  • Article
  • Full-text available

January 2023

·

26 Reads

·

3 Citations

IEEE Access

In some DL applications such as remote sensing, it is hard to obtain the high task performance ( e.g . accuracy) using the DL model on image analysis due to the low resolution characteristics of the imagery. Accordingly, several studies attempted to provide visual explanations or apply the attention mechanism to enhance the reliability on the image analysis. However, there still remains structural complexity on obtaining a sophisticated visual explanation with such existing methods: 1) which layer will the visual explanation be extracted from, and 2) which layers the attention modules will be applied to. 3) Subsequently, in order to observe the aspects of visual explanations on such diverse episodes of applying attention modules individually, training cost inefficiency inevitably arises as it requires training the multiple models one by one in the conventional methods. In order to solve the problems, we propose a new scheme of mediating the visual explanations in a pixel-level recursively. Specifically, we propose DropAtt that generates multiple episodes pool by training only a single network once as an amortized model, which also shows stability on task performance regardless of layer-wise attention policy. From the multiple episodes pool generated by DropAtt, by quantitatively evaluating the explainability of each visual explanation and expanding the parts of explanations with high explainability recursively, our visual explanations mediatio scheme attempts to adjust how much to reflect each episodic layer-wise explanation for enforcing a dominant explainability of each candidate. On the empirical evaluation, our methods show their feasibility on enhancing the visual explainability by reducing average drop about 17% and enhancing the rate of increase in confidence 3%.

Download


Federated Onboard-Ground Station Computing With Weakly Supervised Cascading Pyramid Attention Network for Satellite Image Analysis

January 2022

·

61 Reads

·

7 Citations

IEEE Access

With advances in NanoSat (CubeSat) and high-resolution sensors, the amount of raw data to be analyzed by human supervisors has been explosively increasing for satellite image analysis. To reduce the raw data, the satellite onboard AI processing with low-power COTS (Commercial, Off-The-Shelf) HW has emerged from a real satellite mission. It filters the useless data (e.g. cloudy images) that is worthless to supervisors, achieving efficient satellite-ground station communication. In the application for complex object recognition, however, additional explanation is required for the reliability of the AI prediction due to its low performance. Although various eXplainable AI (XAI) methods for providing human-interpretable explanation have been studied, the pyramid architecture in a deep network leads to the background bias problem which visual explanation only focuses on the background context around the object. Missing the small objects in a tiny region leads to poor explainability although the AI model corrects the object class. To resolve the problems, we propose a novel federated onboard-ground station (FOGS) computing with Cascading Pyramid Attention Network (CPANet) for reliable onboard XAI in object recognition. We present an XAI architecture with a cascading attention mechanism for mitigating the background bias for the onboard processing. By exploiting the localization ability in pyramid feature blocks, we can extract high-quality visual explanation covering the both semantic and small contexts of an object. For enhancing visual explainability of complex satellite images, we also describe a novel computing federation with the ground station and supervisors. In the ground station, active learning-based sample selection and attention refinement scheme with a simple feedback method are conducted to achieve the robustness of explanation and efficient supervisor’s annotation cost, simultaneously. Experiments on various datasets show that the proposed system improves accuracy in object recognition and accurate visual explanation detecting small contexts of objects even in a peripheral region. Then, our attention refinement mechanism demonstrates that the inconsistent explanation can be efficiently resolved only with very simple selection-based feedback.


FIGURE 2. Illustration of pruning a output channel in a convolutional layer.
FIGURE 7. Fraction of remaining channels on each layer of VGG-16 for CIFAR-10 pruned by each comparing method ((a) mem-opt, (b) flop-opt, (c) s-ls-global, (d) s-global) on several overall degrees of pruning.
Computation time required for pretraining and retraining over channel pruning methods
Effect of acceleration on the inference processing time under same batch size
A Channel Pruning Optimization With Layer-Wise Sensitivity in a Single-Shot Manner Under Computational Constraints

January 2022

·

12 Reads

·

2 Citations

IEEE Access

In the constrained computing environments such as mobile device or satellite on-board system, various computational factors of hardware resource can restrict the processing of deep learning (DL) services. Recent DL models such as satellite image analysis mainly require larger resource memory occupation for intermediate feature map footprint than the given memory specification of hardware resource and larger computational overhead (in FLOP) to meet service-level objective in the sense of hardware accelerator. As one of the solutions, we propose a new method of controlling the layer-wise channel pruning in a single-shot manner that can decide how much channels to prune in each layer by observing dataset once without full pretraining. To improve the robustness of the performance degradation, we also propose a layer-wise sensitivity and formulate the optimization problems for deciding layer-wise pruning ratio under target computational constraints. In the paper, the optimal conditions are theoretically derived, and the practical optimum searching schemes are proposed using the optimal conditions. On the empirical evaluation, the proposed methods show robustness on performance degradation, and present feasibility on DL serving under constrained computing environments by reducing memory occupation, providing acceleration effect and throughput improvement while keeping the accuracy performance.


Citations (6)


... Other studies also evaluate the localization ability of CAM methods by turning the attributions into segmentation masks and comparing the IoU or classification accuracy [132,135,155]. Additionally, [156] compare attention networks and CAM variants on the metrics max-sensitivity and average % drop/increase in confidence. Regarding other xAI approaches, the attention weights are evaluated in [144] by inspecting drops in the accuracy for crop mapping when the transformer model is trained on a subset of dates with the highest attention values. ...

Reference:

Opening the Black Box: A systematic review on explainable artificial intelligence in remote sensing
Federated Onboard-Ground Station Computing With Weakly Supervised Cascading Pyramid Attention Network for Satellite Image Analysis

IEEE Access

... A final set of future developments could involve the integration of MuVI into several real-world applications [39][40][41][42]. For example, in medical imaging diagnostics, MuVI could be integrated to identify and visualize the parts of medical images that contribute most to disease prediction. ...

An Alternating Training Method of Attention-Based Adapters for Visual Explanation of Multi-Domain Satellite Images

IEEE Access

... The enclosed case study is oriented on very long time series. A similar concept was delivered by Lee, Kim, and Youn (2021). The authors applied embedding representation to large-scale time series in the task of online forecasting. ...

Pattern-Wise Embedding System for Scalable Time-series Database
  • Citing Conference Paper
  • January 2021

... As the dynamic realistic environment requires the adaptive ability of models, a potential technique known as continual learning (CL) empowers these models to continually acquire, update, and accumulate knowledge on the device from the data by the training behavior. Although powerful cloud platforms can leverage CL to adapt to service expansion, e.g., personalized recommendation systems [16], the similar and intense personalized requirements for edge devices are almost disregarded by academia and industry. This overlook stems from scarce resources in edge devices, such as limited computation, constrained storage, and insufficient memory, which significantly hinders the further application of CL. ...

An Accelerated Continual Learning with Demand Prediction based Scheduling in Edge-Cloud Computing
  • Citing Conference Paper
  • November 2020

... • Modelling in distributed approaches is usually based on extraction of the required information from a modelling node, increasing the usage of the com-Sensors 2023, 23, 3845 3 of 24 munication channel. For example, ref. [18] proposed a frequency analysis based on cosine similarity and deep learning. ...

Cooperating Edge Cloud-Based Hybrid Online Learning for Accelerated Energy Data Stream Processing in Load Forecasting

IEEE Access

... Uber uses loss aversion, recognition, and intrinsic motivation to penalize drivers [12]. The Deep Reinforcement Learning algorithms (Deep RL) were used to tackle the power consumption based on the user's behaviours [13]. Some systems tried to understand student's learning styles to generate recommendations [14]. ...

An Accelerated Edge Cloud System for Energy Data Stream Processing Based on Adaptive Incremental Deep Learning Scheme

IEEE Access