Zonghua Gu

Zhejiang University, Hang-hsien, Zhejiang Sheng, China

Are you Zonghua Gu?

Claim your profile

Publications (34)20.84 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: A framework of optical lab-on-a-chip using porous silicon microcavity membrane is presented in this paper. With measured results of the membrane, reflectance spectra detection is designed and implemented on FPGA. We manage to detect the shift of the resonant dip to distinguish target molecule solution of different concentration. To evaluate the feasibility of lab-on-a-chip, we further model the cost and performance of on-chip optical sensors and data processing system. The novel optical lab-on-a-chip will enable detection of biological samples at a much higher sensitivity than classic electrochemical methods. The efficient detecting algorithm will ensure the speed of pattern detection even when the lab-on-a-chip system has to deal with data from multiple channels.
    IEEE Sensors Journal 01/2012; 12(6):2121-2128. · 1.48 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Task graph scheduling on multiprocessor systems is a representative multiprocessor scheduling problem. A solution to this problem consists of the mapping of tasks to processors and the scheduling of tasks on each processor. Optimal solution can be obtained by exploring the entire design space of all possible mapping and scheduling choices. Since the problem is NP-hard, scalability becomes the main concern in solving the problem optimally. In this paper, a SAT-based optimization framework is proposed to address this problem, in which SAT solver is enhanced by integrating with a scheduling analysis tool in a branch and bound manner to prune the solution space efficiently. Performance evaluation results show that our technique has average performance improvement in more than an order of magnitude compared to state-of-the-art techniques. We further build a cycle-accurate network-on-chip simulator based on SystemC to verify the effectiveness of the proposed technique on realistic multiprocessor systems.
    IEEE Transactions on Parallel and Distributed Systems 09/2011; · 1.80 Impact Factor
  • IEEE Trans. Parallel Distrib. Syst. 01/2011; 22:1382-1389.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Non-preemptive scheduling is usually considered inferior to preemptive scheduling for time critical systems, because the non-preemptive block would lead to poor task responsiveness. Although this is true in single-processor scheduling, we found by empirical simulation experiments that it is not necessarily the case in multiprocessor scheduling. Additionally, non-preemptive scheduling enjoys other benefits like lower implementation complexity and run-time overhead. So non-preemptive scheduling may be a better alternative compared to preemptive scheduling for a considerable part of real-time applications on multiprocessor/multi-core platforms.As the technical contribution, we study the schedulability analysis problem of global non-preemptive fixed-priority scheduling (NP-FP) on multiprocessors. We propose schedulability test conditions for NP-FP, building upon the “problem window analysis” by Baruah [8] for preemptive scheduling. We firstly derive a linear-time general schedulability test condition that works on not only NP-FP, but also any other work-conserving non-preemptive scheduling algorithm. Then we improve the analysis and present a test condition of quadratic time-complexity for NP-FP, which has significant performance improvement comparing to the first one. A notable advantage of our proposed test conditions is, while the test in [8] needs to enumerate for a large number of possible problem window sizes, our proposed test conditions only need to be conducted with a single problem window size, and thereby are significantly more efficient. Experiments with randomly generated task sets are conducted to evaluate the performance of the proposed test conditions.
    Journal of Systems Architecture - Embedded Systems Design. 01/2011; 57:536-546.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many embedded systems have stringent real-time constraints. An effective technique for meeting real-time constraints is to keep the processor utilization on each node at or below the schedulable utilization bound, even though each task’s actual execution time may have large uncertainties and deviate a lot from its estimated value. Recently, researchers have proposed solutions based on Model Predictive Control (MPC) for the utilization control problem. Although these approaches can handle a limited range of execution time estimation errors, the system may suffer performance deterioration or even become unstable with large estimation errors. In this paper, we present two online adaptive optimal control techniques, one is based on Recursive Least Squares (RLS) based model identification plus Linear Quadratic (LQ) optimal controller; the other one is based on Adaptive Critic Design (ACD). Simulation experiments demonstrate both the LQ optimal controller and ACD-based controller have better performance than the MPC-based controller and the ACD-based controller has the smallest aggregate tracking errors.
    Journal of Systems Architecture. 09/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: As power dissipation causes thermal issues in cooling costs, lifetime and reliability, thermal management has become an important issue in today's OS and processor design. Early OS-level thermal management schemes were proposed and evaluated mainly with simulators or analytical models. In this paper, we implement a thermal-aware round-robin scheduling algorithm in the Linux kernel, and compare its performance with the ‘Heat-and-Run’ algorithm and the default Linux baseline scheduler on an Intel Core 2 Duo processor using representative benchmarks from SPEC2000, MiBench and NetBench. Our results indicate that the current Linux scheduler can easily be enhanced with thermal-awareness to show improved performance in terms of both the on-chip temperature condition and application throughput.
    Comput. J. 01/2010; 53:895-903.
  • Source
    Xiaorui Wang, Xing Fu, Xue Liu, Zonghua Gu
    [Show abstract] [Hide abstract]
    ABSTRACT: CPU utilization control has recently been demonstrated to be an effective way of meeting end-to-end deadlines for distributed real-time systems running in unpredictable environments. However, current research on utilization control focuses exclusively on task rate adaptation, which cannot effectively handle rate saturation and discrete task rates. Since the CPU utilization contributed by a real-time periodic task is determined by both its rate and execution time, CPU frequency scaling can be used to adapt task execution times for power-efficient utilization control. In this paper, we present PAUC, a two-layer coordinated CPU utilization control architecture. The primary control loop uses frequency scaling to locally control the CPU utilization of each processor, while the secondary control loop adopts rate adaptation to control the utilizations of all the processors at the cluster level on a finer timescale. Both the two control loops are designed and coordinated based on well-established control theory for theoretically guaranteed control accuracy and system stability. Empirical results on a physical testbed demonstrate that our control solution outperforms a state-of-the-art utilization control algorithm by having more accurate control and less power consumption. Extensive simulation results also show that our solution can significantly improve the feasibility of utilization control.
    IEEE Transactions on Industrial Informatics 01/2010; 6:302-315. · 8.79 Impact Factor
  • Xiuqiang He, Zonghua Gu, Yongxin Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the task model of periodic tasks running on a network of processor nodes connected by a bus based on the time-triggered protocol, an industry-standard bus protocol designed for safety-critical automotive and avionics distributed embedded systems, and present an integrated optimization framework that jointly considers one or more of the following attributes: task-to- processor allocation, task priority assignment, task period assignment and bus access configuration. We adopt a hierarchical optimization framework, where each possible task allocation and priority assignment is treated as one top-level coarse-grained state, which may contain many lower-level fine-grained states defined by different task period assignments and bus access configurations. Simulated annealing is used to explore the top-level states, which calls a geometric programming solver as a subroutine to explore the lower-level states contained within a given top-level state. Performanceevaluationshowsthatourframeworkhasgoodperformanceintermsofsolutionquality and scalability.
    Comput. J. 01/2010; 53:1071-1091.
  • [Show abstract] [Hide abstract]
    ABSTRACT: FPGAs are widely used in today's embedded systems design due to their low cost, high performance, and reconfigurability. Partially RunTime-Reconfigurable (PRTR) FPGAs, such as Virtex-2 Pro and Virtex-4 from Xilinx, allow part of the FPGA area to be reconfigured while the remainder continues to operate without interruption, so that HW tasks can be placed and removed dynamically at runtime. We address two problems related to HW task scheduling on PRTR FPGAs: (1) HW/SW partitioning. Given an application in the form of a task graph with known execution times on the HW (FPGA) and SW (CPU), and known area sizes on the FPGA, find an valid allocation of tasks to either HW or SW and a static schedule with the optimization objective of minimizing the total schedule length (makespan). (2) Pipelined scheduling. Given an input task graph, construct a pipelined schedule on a PRTR FPGA with the goal of maximizing system throughput while meeting a given end-to-end deadline. Both problems are NP-hard. Satisfiability Modulo Theories (SMT) is an extension to SAT by adding the ability to handle arithmetic and other decidable theories. We use the SMT solver Yices with Linear Integer Arithmetic (LIA) theory as the optimization engine for solving the two scheduling problems. In addition, we present an efficient heuristic algorithm based on kernel recognition for the pipelined scheduling problem, a technique borrowed from SW pipelining, to overcome the scalability problem of the SMT-based optimal solution technique.
    ACM Trans. Design Autom. Electr. Syst. 01/2010; 15.
  • Source
    Weichen Liu, Zonghua Gu, Jiang Xu
    [Show abstract] [Hide abstract]
    ABSTRACT: Synchronous dataflow (SDF) is a widely-used model of computation for digital signal processing and multimedia applications. In this letter, we propose an automatic approach to synthesize efficient software from SDF models with improved runtime efficiency. Our synthesis technique is based on dynamic single-appearance scheduling (dynSAS), which generates software with minimized code size, the same as traditional single-appearance schedule (SAS), while requires much less buffer memory space. We enhance dynSAS systematically to reduce control flow overhead and increase memory utilization. Experiment results show that our approach can generate efficient software with enhanced runtime performance compared to related techniques.
    IEEE embedded systems letters 01/2009; 1:69-72.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Partial Runtime Reconfigurable (PRTR) FPGAs allow HW tasks to be placed and removed dynamically at runtime. We make two contributions in this paper. First, we present an efficient algorithm for finding the complete set of Maximal Empty Rectangles on a 2D PRTR FPGA. We also present a HW implementation of the algorithm with negligible runtime overhead. Second, we present an efficient online deadline-constrained task placement algorithm for minimizing area fragmentation on the FPGA by using an area fragmentation metric that takes into account probability distribution of sizes of future task arrivals as well as the time axis. The techniques presented in this paper are useful in an operating system for runtime reconfigurable FPGAs to manage the HW resources on the FPGA when HW tasks that arrive and finish dynamically at runtime.
    Microprocessors and Microsystems - Embedded Hardware Design. 01/2009; 33:374-387.
  • Nan Guan, Zonghua Gu, Wang Yi, Ge Yu
    [Show abstract] [Hide abstract]
    ABSTRACT: Synchronous dataflow (SDF) is a well-known model of computation for dataflow-oriented applications such as embedded systems for signal processing and multimedia. It is important to minimize the buffer size requirements of applications generated from SDF graphs, since memory space is often a scarce resource in these systems due to cost or power consumption constraints. Some authors have proposed to use model-checking for finding the minimum buffer size requirements, but the scalability of model-checking is limited by state space explosion. In this paper, we present several techniques for reducing state space size and improving scalability of model-checking by exploiting problem-specific properties of SDF graphs.
    Proceedings of the 14th Asia South Pacific Design Automation Conference, ASP-DAC 2009, Yokohama, Japan, January 19-22, 2009; 01/2009
  • Source
    Xiaorui Wang, Xing Fu, Xue Liu, Zonghua Gu
    [Show abstract] [Hide abstract]
    ABSTRACT: CPU utilization control has recently been demonstrated to be an effective way of meeting end-to-end deadlines for distributed real-time systems running in unpredictableenvi- ronments. However, current research on utilization control focuses exclusively on task rate adaptation, which cannot effectively handle rate saturation and discrete task rates. Since the CPU utilization contributed by a real-time peri- odic task is determined by both its rate and execution time, CPU frequency scaling can be used to adapt task execution times for power-efficient utilization control. In this paper, we present a two-layer coordinated CPU utilization control architecture. The primary control loop uses frequency scal- ing to locally control the CPU utilization of each processor, while the secondary control loop adopts rate adaptation to control the utilizations of all the processors at the cluster level on a finer timescale. Both the two control loops are designed and coordinated based on well-established con- trol theory for theoretically guaranteed control accuracy and system stability. Empirical results on a physical testbed demonstrate that our control solution outperforms a state- of-the-art utilization control algorithm by having more ac- curate control and less power consumption.
    15th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2009, San Francisco, CA, USA, 13-16 April 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Synchronous Data∞ow (SDF) is a widely-used model of com- putation for digital signal processing and multimedia appli- cations, which are typically implemented on memory con- strained hardware platforms. SDF can be statically ana- lyzed and scheduled, and the memory requirement for cor- rect execution can be predicted at compile time. In this paper, we present an e-cient technique based on model- checking for exact analysis of minimal bufier requirement of an SDF graph to guarantee deadlock-free execution. Per- formance evaluation shows that our approach can achieve signiflcant performance improvements compared to related work.
    Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2009, Grenoble, France, October 11-16, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As Moore's law comes to an end, multi-processor (MP) systems are becoming increasingly important in embedded systems design, hence real-time schedulability analysis for MP systems has become an important research topic. In this paper, we present an exact method for schedulability analysis of global multiprocessor scheduling with either fixed-priority (FP) or earliest-deadline-first (EDF) algorithms using the model-checker NuSMV. Compared to safe but pessimistic schedulability tests based on processor utilization bounds, model-checking can provide an exact answer to the schedulability of a taskset, as well as quantitative information on each task's best-case and worst- case response times.
    Object Oriented Real-Time Distributed Computing (ISORC), 2008 11th IEEE International Symposium on; 06/2008
  • Source
    Mingxuan Yuan, Xiuqiang He, Zonghua Gu
    [Show abstract] [Hide abstract]
    ABSTRACT: FPGAs are often used together with a CPU as hardware accelerators. A runtime reconfigurable FPGA allows part of the FPGA area to be reconfigured while the remainder continues to operate without interruption, so that hardware tasks can be placed and removed dynamically at runtime. In this paper, we formulate and solve the problem of optimal hardware/software partitioning and static task scheduling for a hybrid FPGA/CPU device, with the optimization objective of minimizing the total schedule length, in the framework of satisfiability modulo theories (SMT) with linear integer arithmetic.
    Real-Time and Embedded Technology and Applications Symposium, 2008. RTAS '08. IEEE; 05/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we address the problem of static scheduling and software synthesis for dataflow graphs with the symbolic model- checker NuSMV using a two-step process: first use model- checking to obtain a static schedule with the objective of min- imizing the data buffer size, then synthesize efficient code from the static schedule with the objective of minimizing code size and performance overheads due to runtime dynamic decisions. We show the effectiveness of these techniques using a number of digital signal processing examples.
    Real-Time Systems Symposium, 2007. RTSS 2007. 28th IEEE International; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the schedulability analysis problem for non- preemptive scheduling algorithms on multiprocessors. To our best knowledge, the only known work on this problem is the test condition proposed by Baruah (1) (referred to as (BAR- EDFnp)) for non-preemptive EDF scheduling, which will re- ject a task set with arbitrarily low utilization if it contains a task whose execution time is equal or greater than the mini- mal relative deadline among all tasks. In this paper, we firstly derive a linear-time test condition which avoids the problem mentioned above, by building upon the work in (2) for preemp- tive multiprocessor scheduling. This test condition works on not only non-preemptive EDF, but also any other work-conserving non-preemptive scheduling algorithms. Then we improve the analysis and present test conditions of pseudo-polynomial time- complexity for Non-preemptive Earliest Deadline First schedul- ing (EDFnp) and Non-preemptive Fixed Priority scheduling (FPnp) respectively. Experiments with randomly generated task sets show that our proposed test conditions, especially the im- proved test conditions, have significant performance improve- ments compared with (BAR-EDFnp).
    Proceedings of the 29th IEEE Real-Time Systems Symposium, RTSS 2008, Barcelona, Spain, 30 November - 3 December 2008; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Static path analysis is a key process of Worst Case Execution Time (WCET) estimation, the objective of which is to find the execution path that has the largest execution time. Currently, there is an argument in the research community whether model checking is another good solution for WCET analysis, besides ILP. To our knowledge, no paper so far has addressed this argument with real performance data. In this paper, we implement both ILP and model checking for static path analysis of WCET, and the experiment results show that ILP yields very good performance, while model checking only works well for simple programs, and it is inclined to scalability problems when dealing with programs that have complex structures and large loop counts.
    2008 IEEE/IPIP International Conference on Embedded and Ubiquitous Computing (EUC 2008), Shanghai, China, December 17-20, 2008, Volume I; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many embedded systems have stringent real-time constraints. An effective technique for meeting real-time constraints is to keep the processor utilization on each node at or below the schedulable utilization bound, even though each task’s actual execution time may have large uncertainties and deviate a lot from its estimated value. Recently, researchers have proposed solutions based on Model Predictive Control (MPC) for the utilization control problem. Although these approaches can handle a limited range of execution time estimation errors, the system may suffer performance deterioration or even become unstable with large estimation errors. In this paper, we present two online adaptive optimal control techniques, one is based on Recursive Least Squares (RLS) based model identification plus Linear Quadratic (LQ) optimal controller; the other one is based on Adaptive Critic Design (ACD). Simulation experiments demonstrate both the LQ optimal controller and ACD-based controller have better performance than the MPC-based controller and the ACD-based controller has the smallest aggregate tracking errors.
    Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS 2008, Atlanta, GA, USA, October 19-24, 2008; 01/2008

Publication Stats

203 Citations
20.84 Total Impact Points

Institutions

  • 2009–2011
    • Zhejiang University
      Hang-hsien, Zhejiang Sheng, China
  • 2010
    • The University of Tennessee Medical Center at Knoxville
      Knoxville, Tennessee, United States
  • 2007–2010
    • The Hong Kong University of Science and Technology
      • Department of Computer Science and Engineering
      Kowloon, Hong Kong
    • Northeastern University
      Boston, Massachusetts, United States
  • 2007–2008
    • Northeastern University (Shenyang, China)
      • College of Information Science and Engineering
      Feng-t’ien, Liaoning, China