Shiao-Li Tsao’s research while affiliated with National Yang Ming Chiao Tung University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (74)


Efficient and Portable Workgroup Size Selection
  • Article

August 2019

·

27 Reads

·

4 Citations

IEEE Transactions on Parallel and Distributed Systems

·

Shiao-Li Tsao

The performance of an OpenCL program is strongly influenced by both hardware and software attributes. To achieve superior performance, developers may leverage automatic performance tuning techniques to determine the optimal parameters on the target device. Although existing approaches have shown promising tuning results in their target scenarios, other requirements such as efficiency, portability, and usability should also be considered because of the rapid growth of heterogeneous computing applications and platforms. In this paper, we re-examine the workgroup size tuning problem and propose a novel approach to meet the aforementioned requirements. We abstract the architectural details into a set of hardware parameters so that the proposed approach can be applied without the presence of target devices, which makes it more accessible to developers. The proposed approach is evaluated on 20 OpenCL kernels and six devices, including both CPUs and GPUs. Experimental results demonstrate that, with negligible overhead, our approach filters out 88.6% of the possible workgroup sizes on average. Among all the workgroup size candidates, the best- and worst-performing candidates can achieve average performance of 95.5% and 92.1%, respectively, compared with the optimal workgroup size.


Domain Specific Approximation for Object Detection

October 2018

·

·

·

[...]

·

There is growing interest in object detection in advanced driver assistance systems and autonomous robots and vehicles. To enable such innovative systems, we need faster object detection. In this work, we investigate the trade-off between accuracy and speed with domain-specific approximations, i.e. category-aware image size scaling and proposals scaling, for two state-of-the-art deep learning-based object detection meta-architectures. We study the effectiveness of applying approximation both statically and dynamically to understand the potential and the applicability of them. By conducting experiments on the ImageNet VID dataset, we show that domain-specific approximation has great potential to improve the speed of the system without deteriorating the accuracy of object detectors, i.e. up to 7.5x speedup for dynamic domain-specific approximation. To this end, we present our insights toward harvesting domain-specific approximation as well as devise a proof-of-concept runtime, AutoFocus, that exploits dynamic domain-specific approximation.


Domain Specific Approximation for Object Detection
  • Article
  • Full-text available

January 2018

·

37 Reads

·

13 Citations

IEEE Micro

There is growing interest in object detection in advanced driver assistance systems and autonomous robots and vehicles. To enable such innovative systems, we need faster object detection. In this work, we investigate the trade-off between accuracy and speed with domain-specific approximations, i.e. category-aware image size scaling and proposals scaling, for two state-of-the-art deep learning-based object detection meta-architectures. We study the effectiveness of applying approximation both statically and dynamically to understand the potential and the applicability of them. By conducting experiments on the ImageNet VID dataset, we show that domain-specific approximation has great potential to improve the speed of the system without deteriorating the accuracy of object detectors, i.e. up to 7.5x speedup for dynamic domain-specific approximation. To this end, we present our insights toward harvesting domain-specific approximation as well as devise a proof-of-concept runtime, AutoFocus, that exploits dynamic domain-specific approximation.

Download


High-level energy consumption model of embedded graphic processors

July 2015

·

14 Reads

·

5 Citations

Embedded graphic processing unit (GPU) is an indispensable component in enabling real-time rendering and graphic applications on mobile devices. However, embedded GPU consumes a considerable energy [1] which is critical for battery-operated devices. To understand the energy consumption of a graphic application, conventional approaches suggested the energy model based on hardware performance counters. However, those low-level energy models are mainly derived from GPUs of desktop computers, and they cannot be applied to embedded GPUs directly and the low-level models are less intuitive from a programmer's point of view. In this study, we consider a high-level energy consumption model for embedded graphic processors, and then we can estimate energy consumption of a graphic application based on graphic attributes of a scene. We conduct a number of experiments on real platforms to validate the proposed model. Our experimental results demonstrate that an average energy estimation error rate of 7.30% can be achieved.


Adaptive Lookup Protocol for Two-Tier VANET/P2P Information Retrieval Services

March 2015

·

48 Reads

·

22 Citations

IEEE Transactions on Vehicular Technology

Intelligent transportation system (ITS) services have attracted significant attention in recent years. To support ITS services, architecture is required to retrieve information and data from moving vehicles and roadside facilities in an efficient manner. A two-tier system that integrates low-tier vehicular ad hoc networks (VANETs) and a high-tier infrastructure-based peer-to-peer (P2P) overlay, which can achieve a high lookup success rate and low lookup latency for information retrieval, has been developed. However, conventional information lookups in the two-tier VANET/P2P system may introduce extra lookup messages and latencies because the lookup queries are simultaneously performed over the VANET/P2P networks. This paper proposes an adaptive lookup protocol for the two-tier VANET/P2P system to improve the efficiency of information retrieval. The proposed protocol uses a Bloom filter, which is a space-efficient data structure, to collect reachability information of road segments; therefore, adaptive routing of queries between low- and high-tier networks according to reachability probability can be employed. Simulations based on the SUMO traffic simulator and QualNet network simulator demonstrate that compared with the conventional two-tier lookup mechanism, the adaptive lookup protocol can reduce the lookup latency by 12%, reduce the P2P lookup overhead by 20%–33%, and achieve a high success rate in information lookups.


Method and system for detecting an applicance based on users' feedback information

January 2015

·

9 Reads

A method and system for detecting an appliance based on users' feedback information, particularly a nonintrusive load monitoring method and system based on a user's feedback information and a joint strategic decision search algorithm are disclosed. By means of obtaining the users' feedback information on an appliance inputted by users or a search result of the appliances being confirmed by the users to generate a mapping between the model of the appliances and at least one load signature of each model of appliances; the users' feedback information is recorded into a smart meter or a cloud computing system, and a mathematical analysis is further used to compute an occurrence of any one signature of the appliance and the identification rate of each signature; then the joint strategic decision search algorithm automatically identifies various models of appliances and analyzes the operating states of the electric appliances in homes or offices.


SEEDS: A solar-based energy-efficient distributed server farm

January 2015

·

30 Reads

·

19 Citations

IEEE Transactions on Systems Man and Cybernetics Systems

Distributed renewable energy has emerged as a promising resource because of its environmental friendliness and economic considerations. However, most renewable energy sources are unreliable and may require considerable effort to be efficiently utilized in a computing center for providing services. In this paper, we exploit distributed renewable energy (e.g., solar energy) and peer-to-peer (P2P) technologies to aggregate distributed computing resources to provide an infrastructure called solar-based energy-efficient distributed server (SEEDS) farm for distributed computing and distributed storage. Energy-efficient devices (e.g., embedded devices) powered by solar energy form a P2P computing system to provide their computing resources to end-users. Specifically, this paper uses a Web-based service as a case study. A group of solar-powered embedded devices acts as a front-end cooperative caching system for Web servers. Web objects may be accessed through the distributed caching system without going through servers, and thus we can reduce brown energy consumption of the servers. This paper also develops an analytical model to evaluate the total energy consumption of Web-based services with and without SEEDS. Theoretical and simulation results show that the SEEDS system can support services and achieve significant improvements in energy efficiency by aggregating distributed energy resources.


An end-to-end channel allocation scheme for a wireless mesh network

December 2014

·

14 Reads

·

8 Citations

International Journal of Communication Systems

SUMMARY Co-channel interference seriously influences the throughput of a wireless mesh network. This study proposes an end-to-end channel allocation scheme (EECAS) that extends the radio-frequency-slot method to minimize co-channel interference. The EECAS first separates the transmission and reception of packets into two channels. This scheme can then classify the state of each radio-frequency-slot as transmitting, receiving, interfered, free, or parity. A node that initiates a communication session with a quality of service requirement can propagate a channel allocation request along the communication path to the destination. By checking the channel state, the EECAS can determine feasible radio-frequency-slot allocations for the end-to-end path. The simulation results in this study demonstrate that the proposed approach performs well in intra-mesh and inter-mesh communications, and it outperforms previous channel allocation schemes in end-to-end throughput. Copyright © 2013 John Wiley & Sons, Ltd.


A dynamic load-balancing scheme for heterogeneous wireless networks

November 2014

·

22 Reads

·

7 Citations

Current heterogeneous wireless networks often overlap because of their complementary characteristics and the large deployment of various wireless access technologies. Mobile devices, which are equipped with multiple wireless interfaces, called multiple radio access technologies (multi-RATs) mobile stations (MSs), are also becoming increasingly popular. Therefore, common radio resource management (CRRM) has been proposed to coordinate heterogeneous radio resource allocations and improve the resource utilization of heterogeneous wireless networks. However, CRRM is an NP-hard problem, and low-complexity approaches for dynamic resource management are in high demand. In this paper, the resource request and allocation between multi-RAT MSs and heterogeneous wireless networks are modeled as a directed graph. Thus, the problem of searching for feasible radio resource allocations is simplified to finding trees in a directed graph. Based on the proposed model, a heuristic scheme can be used to find a feasible solution efficiently, and dynamically adjust the workload of heterogeneous BSs to accommodate new requests. Experimental results demonstrate that the heuristic scheme can reduce the request reject rate by 10%-55% compared with conventional approaches.


Citations (53)


... HIP creates new opportunities for autotuning. For example, OpenCL on AMD was restricted to at most 256 threads per block [13,6,33,24], whereas HIP increases this limit to 1024. ...

Reference:

Bringing Auto-tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs
Efficient and Portable Workgroup Size Selection
  • Citing Article
  • August 2019

IEEE Transactions on Parallel and Distributed Systems

... Zheng et al. studied the human sleep process and classified sleep stages and realized the extraction and classification of EEG features based on K-means clustering algorithm [5]. Chin et al. show great potential to improve system speed without degrading object detector accuracy by experimenting on the ImageNet VID dataset, i.e., the speedup of dynamic domain-specific approximation is up to 7.5 times [6]. Khalifa et al. use the method of detecting multiple objects by background subtraction to sequentially track the features of different surveillance videos. ...

Domain Specific Approximation for Object Detection

IEEE Micro

... The solutions for preamble collision detection include the use of tagged preambles [6], non-orthogonal pilots [7], access class barring (ACB), random back-off (RB) and power ramping (PR) [10]. The preamble collision probabilities considering the long term evolution (LTE) standard is described in [8]. An upper bound for the throughput and collision avoidance probability have been derived in [9]. ...

RACH collision probability for machine-type communications
  • Citing Article
  • January 2012

... The first method involves the use of primitive information on a 3D graphics pipeline [5], the collection of batch, vertex, and fragment data by each rendering path [6], and the analysis of power consumption by 3D graphics components in each pipeline stage [7] [8]. The second method entails the use of information on vertex-processing and pixel-processing loads [9], frequency scaling that accounts for user and application conditions [10], and a dynamic power predication scheme that accords with CPU and GPU usage [11]. ...

High-level energy consumption model of embedded graphic processors
  • Citing Conference Paper
  • July 2015

... Finally, the analyzed software often operates in modes, therefore a WCET result which is parametric on the mode and input would be highly desirable [GE07]. This is supported by the finding, that the separation of multiple scenarios which use a common code base can reduce the resulting WCETs by up to 70% [HKB+14]. ...

Static WCET analysis of the H.264/AVC decoder exploiting coding information
  • Citing Conference Paper
  • August 2014

... However, this is not a simple task. Modern microprocessors contain multiple cores and billions of transistors since 2006, and Moore's law predicts that the number of transistors contained in the most advanced silicon chips doubles every year, so evaluating heat dissipation in these modern processors by building an equivalent electrical circuit (a methodology adopted in several literature studies, such as Huang et al. [11] and Floros et al. [12]) is gradually becoming impracticable. Also, in 2021 alone, AMD ® and INTEL ® jointly launched 44 new models of desktop processors (CPU-World [13]). ...

An efficient thermal estimation scheme for microprocessors
  • Citing Conference Paper
  • August 2014

... Its close relationship with innovation and its association with a contribution to country development has generated great interest among various research areas, inspiring different investigations with a wide arrange of scopes. Many of these investigations focus on topics such as optimization, load balancing, quality of services, among others [7,10,13,15,23]. ...

A dynamic load-balancing scheme for heterogeneous wireless networks
  • Citing Article
  • November 2014

... Thus, burst should be placed on the subchannel with the best channel quality (called best subchannels), while considering the subchannel diversity. Various burst construction methods have been proposed to solve the subchannel diversity issue [3][4][5][6][7][8][9][10][11][12][13][14]. ...

A dynamic frequency allocation scheme for IEEE 802.16 OFDMA-based WMANs using Hungary algorithm
  • Citing Conference Paper
  • December 2007

... Also, many works in the field are based on an advantageous usage of geographical properties of the distribution of Road Side-Units (among others [5], [51], [118], [120]), whereas our work is only based on the already widespread mobile broadband infrastructure as well as data analysis capabilities already in place at car manufacturers' data centers. Query-answering mechanisms for Vehicular Networks in the literature also predominantly concentrate on using the architecture of the network (for instance using pre-existing P2P approaches, as in [116], [129], [197] or 2-tier architectures [39], [186]) to resolve the query. In this work, we do not presume any connections between vehicles; this positions our work in readily deployable technologies on modern vehicles. ...

Adaptive Lookup Protocol for Two-Tier VANET/P2P Information Retrieval Services
  • Citing Article
  • March 2015

IEEE Transactions on Vehicular Technology

... Long-term resilience, resource efficiency, and environmental impact reduction are given top priority in sustainable seed storage methods (De Boef, et al., 2010). This entails creating environmentally friendly seed packing materials, building energyefficient storage facilities, and building the infrastructure for seed storage using renewable energy sources (Cheng et al., 2014). In addition, the use of naturally occurring substances possessing antimicrobial qualities as seed treatments promotes insect resistance without depending on artificial pesticides (Bonome et al., 2020). ...

SEEDS: A solar-based energy-efficient distributed server farm
  • Citing Article
  • January 2015

IEEE Transactions on Systems Man and Cybernetics Systems