Nobuyuki Yamasaki

Keio University, Edo, Tōkyō, Japan

Are you Nobuyuki Yamasaki?

Claim your profile

Publications (72)1.96 Total impact

  • K. Mizotani · Y. Hatori · Y. Kumura · M. Takasu · H. Chishiro · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: As microprocessor performance grows, high throughput and the power management are becoming more important on embedded real-time systems. Real- Time Voltage and Frequency Scaling (RT-VFS) has been proposed to reduce the power consumption and ensure real-time constraints. An imprecise computation model adds an optional part to Liu and Layland's model to improve the quality of computations. This paper proposes the scheme to integrate an imprecise computation model and RT-VFS that can reduce the power consumption and improve the quality of computations within real-time constraints. Moreover, we implement this scheme on Dependable Responsive Multithreaded Processor (D-RMTP). D-RMTP is a prioritized simultaneous multithreaded processor for embedded real-time systems and D-RMTP system in a package supports RT-VFS. We implement the proposed scheme by use of D-RMTP original features to improve the quality of computations and reduce the energy consumption. Through experimental evaluation, we show that the proposed scheme satisfies both the lower energy consumption and higher performance on real environments. In our evaluation, the proposed scheme achieves a maximum of 135% improvement of the quality of computations per energy consumption. Copyright © 2015 by The International Society for Computers and Their Applications (ISCA).
    No preview · Article · Jan 2015
  • M. Takasu · K. Mizotani · Y. Kumura · H. Chishiro · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Many embedded real-time systems require to process at high throughput while meeting real-time constraints. To improve performance of systems, multiprocessors have been used widely. Reduction of energy consumption is one of the most important issues in such systems, which operate with limited system resources. From the point of view of real-time scheduling, many approaches which save energy consumption have been proposed. Most of previous works adopt energy models without leakage energy because they assume that switching energy dominates. However, with the CMOS technology scaling, leakage energy has become a significant factor in overall energy consumption and previous works which only target reduction of switching energy are not always effective. In this paper, we propose two leakage-aware energy-efficient partitioning techniques named Suboptimal and Leakage-Aware Load Balancing (LALB) in multiprocessors. Suboptimal firstly determines the number of processors to minimize energy consumption and then assigns tasks uniformly. On the other hand, LALB firstly assigns tasks into all processors uniformly and then decreases the number of processors. We discuss time complexity of the proposed techniques and its feasibility. Simulation results show that the proposed techniques reduce energy consumption by an average of about 22% compared to existing techniques when leakage energy is dominant. Copyright © 2015 by The International Society for Computers and Their Applications (ISCA).
    No preview · Article · Jan 2015
  • H. Chishiro · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time systems such as humanoid robots require low jitter and high Quality of Service (QoS). An imprecise computation is one of the solutions to improve QoS but dynamic-priority imprecise real-time scheduling has high jitter. Semi-fixed-priority scheduling was presented to achieve low jitter and high QoS. Unfortunately, a semi-fixed-priority scheduling algorithm, called Rate Monotonic with Wind-up Part (RMWP), has usually high jitter if the actual case execution time (ACET) of each task is shorter than its worst case execution time (WCET). We propose a new semi-fixedpriority scheduling algorithm, called Rate Monotonic with Wind-up Part++ (RMWP++), to achieve the zero-jitter of each task with harmonic periodic task sets. The zero-jitter technique adds the previous and post optional parts to the extended imprecise computation model that has a second mandatory (wind-up) part. We prove that the jitter of each task in RMWP++ is always zero and the least upper bound of RMWP++ is one with harmonic periodic task sets on uniprocessors. Simulation results show that RMWP++ achieves the zero-jitter and has the smaller number of context switches than RMWP, if the ACET of each task is shorter than its WCET. Copyright © 2015 by The International Society for Computers and Their Applications (ISCA).
    No preview · Article · Jan 2015
  • Y. Kumura · K. Mizotani · M. Takasu · H. Chishiro · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent real-time systems, having multicore or simultaneous multithreaded (SMT) processor is now widespread. Considering the complexity of today's architecture, predicting the behavior of task execution is challenging. And this has become a severe problem in real-time systems because estimating worst-case execution time (WCET) of tasks is important in constructing a dependable and accurate real-time systems. In this research, we provide overhead-aware schedulability analysis on a SMT processor. We take measurementbased approach in order to estimate WCET of tasks by measuring various runtime overheads that a realtime OS poses, because runtime overheads have a considerable impact on systems. Those measured overheads should be incorporated in schedulability analysis to evaluate the system more accurately. We target Dependable Responsive Multithreaded Processor (DRMTP), an 8-way SMT processor, to estimate runtime overheads, and conduct a series of schedulability analysis with measured runtime overheads considered. Our evaluation shows that the advantage of having context cache mechanism, the unique functionality of D-RMTP, can improve schedulability by up to 15.9%. Copyright © 2015 by The International Society for Computers and Their Applications (ISCA).
    No preview · Article · Jan 2015
  • Hiroyuki Chishiro · Nobuyuki Yamasaki

    No preview · Conference Paper · Jun 2014
  • Hiroyuki Chishiro · James H. Anderson · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing multiprocessor real-time scheduling algorithms follow partitioning/global scheduling approaches or some hybrid approaches of the two. Under partitioning, all tasks are assigned to specific processors. Under global scheduling, tasks may migrate among processors. Global scheduling has the advantage of better schedulability compared to partitioning. However, optimal algorithms based on global scheduling such as PD2 [5] and LLREF [3] incur significant overhead.
    No preview · Article · Jul 2013 · ACM SIGBED Review
  • Hiroyuki Chishiro · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: An imprecise computation model has the advantage of supporting overloaded conditions in dynamic real-time environments, compared to Liu and Layland's model. However, the imprecise computation model is not practical because the termination of each optional part cannot guarantee the schedulability. In order to guarantee the schedulability of the termination of the optional part, a practical imprecise computation model is presented. In the practical imprecise computation model, each task has multiple mandatory parts and optional parts to support many imprecise real-time applications. The practical imprecise computation model is supported by dynamic-priority scheduling on uniprocessors. Unfortunately, dynamic-priority scheduling is difficult to support multiprocessors. In contrast, semifixed-priority scheduling, which is part-level fixed-priority scheduling, supports only two mandatory parts so that supported imprecise real-time applications are restricted. This paper presents semi-fixed-priority scheduling with multiple mandatory parts on uniprocessors and multiprocessors respectively. In addition, this paper explains how to calculate the optional deadline of each task, which is the termination time of optional part. The schedulability analysis shows that semi-fixed-priority scheduling strictly dominates fixed-priority scheduling. Thanks to semi-fixed-priority scheduling with multiple mandatory parts, many imprecise realtime applications can be supported.
    No preview · Conference Paper · Jun 2013
  • O. Yoshizumi · H. Matsutani · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Since distributed real-time systems require strict time constraints on communication, packets must be routed to meet the real-time constraints. To reduce the communication latency and satisfy the real-time constraints, a minimal path for each pair of source and destination is selected, then its schedulability is verified if the selected path meets the constraints. Since such routing schemes may form cyclic dependencies across multiple packets, routing schemes must be designed to guarantee deadlock-freedom. In this paper, we propose routing schemes of a real-time communication link that guarantee both real-time capability and deadlock-freedom for distributed real-time systems. Simulation results show that the proposed routing schemes improve the schedulability of communications by up to 30%. Copyright © (2013) by the International Society for Computers and Their Applications.
    No preview · Article · Jan 2013
  • Y. Kumura · K. Suito · H. Matsutani · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed real-time systems that consist of multiple tasks with time constraints implemented on multiple processors have been used in various embedded systems, such as humanoid robots. These processors are interconnected with a real-time network, such as Responsive Link, in which data rate of each channel can be changed individually and dynamically. In this paper, we propose a low-power communication technique for distributed real-time systems. The power consumption is reduced by slowing down the data rate of communication link while satisfying the time constraints by exploiting the slack time, which is the difference between the arrival time of data and the deadline. We first measured power consumption of Responsive Link with various data rates and then implemented a real-time network simulator using the measured power values. We also implement a run-time communication mode change mechanism on Responsive Link and discuss the feasibility. Simulation results show that the proposed low-power packet transfer technique reduces the power consumption by up to 53.46% with a negligible degradation on the schedulability.
    No preview · Conference Paper · Jan 2013
  • K. Suito · M. Takasu · R. Ueda · K. Fujii · H. Matsutani · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents experimental evaluations of low power techniques on the D-RMTP (Dependable Responsive Multithreaded Processor) for distributed real-time systems. The D-RMTP has the fine-grained module level low power techniques including DVFS (Dynamic Voltage and Frequency Scaling) and clock gating. In real-time systems, the gain and overheads (voltage and clock transition latencies) must be known beforehand in order to meet time constraints. Thus, detailed analysis using actual equipment is required since these parameters are strongly dependent on the platform. In this paper we analyze a power consumption, a chip temperature, and an overhead when DVFS and clock gating are applied. The experimental results show that the maximum overheads of changing voltage, changing frequency, and clock gating are 75μs, 12μs, and 4μs, respectively. When the DVFS is applied, the power consumption and temperature of the D-RMTP are reduced by 88% and 23% respectively. The power consumption of each hardware module is reduced by approximately 41∼98%. The results obtained by the experiments present a favorable trade-off between the power and temperature reduction and the transition overhead. Copyright © (2013) by the International Society for Computers and Their Applications.
    No preview · Article · Jan 2013
  • Kazutoshi Suito · Kei Fujii · Hiroki Matsutani · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe design and implementation of the Dependable Responsive Multithreaded Processor (D-RMTP) SoC (System-on-a-Chip) and SiP (System-in-a-Package). The D-RMTP SoC provides almost all functions required for the humanoid robots, including a real-time processing unit, a real-time inter-node communication link with error correction, and various I/O peripherals. The D-RMTP SoC is implemented in a 10mm×10mm chip with a TSMC 130nm process technology. The D-RMTP SiP implemented in a 30mm×30mm board integrates the D-RMTP SoC, DDR-SDRAM, flash memory, power supply circuit, and temperature and voltage sensors for reliable DVFS.
    No preview · Article · Nov 2012 · IEEE Micro
  • Hiroyuki Chishiro · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays multicore systems have been used in real-time applications such as robots. In robots, imprecise tasks such as image processing tasks are required to detect and avoid objects. However, existing real-time operating systems have evaluated multiprocessor real-time scheduling algorithms in Liu and Lay land's model and have not evaluated those in the imprecise computation model. This paper performs experimental evaluations of global and partitioned semi-fixed-priority scheduling algorithms in the extended imprecise computation model on multicore systems. Experimental results show that semi-fixed-priority scheduling has comparable overhead to fixed-priority scheduling. In addition, global semi-fixed-priority scheduling has lower overhead than partitioned semi-fixed-priority scheduling.
    No preview · Article · Jan 2012
  • Source
    K. Matsumoto · H. Umeo · N. Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time execution of applications is one of key requirements for Cyber-Physical Systems (CPS) that integrate computational and physical elements for our social infrastructure, such as robotics, transportation, and consumer appliances. In such real-time systems, a task must be executed so as not to violate given time constraints. Moreover, it is desirable that the execution time of the task is predictable precisely. When Out-of-Order (OoO) execution is adopted for real-time systems to enhance the performance, it is much difficult to predict execution time because of the feature of OoO execution. In order to deal with this problem, various schemes were proposed such as IPC control mechanism of Responsive Multithreaded (RMT) Processor. RMT Processor is a real-time microprocessor adopting simultaneous multithreading (SMT) architecture with OoO execution. Its IPC control mechanism which tries to adjust the number of instruction commits to meet a given target IPC. The IPC control scheme can be implemented not only on RMT Processor but also on various processors and can improve the predictability of execution time. However, if an error between target and actual IPCs is observed, it cannot cancel the error in the next control window, which is used in the control mechanism. Since such uncorrected errors are accumulated in the successive control window, the predictability of the execution time is degraded gradually. To overcome this problem, in this paper, we propose a thread speed control scheme for real-time microprocessors. This scheme is based on the IPC control mechanism on RMT Processor. Our proposed thread speed control scheme calculates an error between reference and actual IPCs, then it dynamically updates the reference IPC of the next control window in order to cancel the past errors. Our proposed scheme is designed and implemented on RMT Processor. The simulation results show that the error is reduced to 2.60 × 10<sup>-5</sup> % in case that four threads - - are executed simultaneously.
    Preview · Conference Paper · Oct 2011
  • Source
    Kei Fujii · Hiroyuki Chishiro · Hiroki Matsutani · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Cyber Physical Systems are composed of many embedded systems which monitor and control the physical processes for tight integrations of computation and physical processes. Such embedded systems require not only real-time capabilities but also high throughput and low power consumption. High throughput is mainly achieved by parallel architectures such as Simultaneous Multithreading (SMT) and Chip Multiprocessor (CMP), and low power consumption is mainly achieved by Real-Time Dynamic Voltage and Frequency Scaling (RT-DVFS) under the real-time constraint. In this paper, we present a RT-DVFS algorithm called Hetero Efficiency to Logical Processor (HeLP) which can reduce power consumption easily and effectively in prioritized SMT processors. We also present Hetero Efficiency to Logical Processor with Temporal Migration (HeLP-TM) which applies the temporal migration technique to HeLP. Simulation results show that HeLP can reduce power consumption effectively and HeLPTM is more effective than HeLP.
    Preview · Article · Aug 2011
  • Source
    Shinpei Kato · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an algorithm, called Earliest Deadline Critical Laxity (EDCL), for scheduling sporadic task systems on multiprocessors. EDCL is a derivative of the Earliest Deadline Zero Laxity (EDZL) algorithm. Each job is assigned a priority based on the well-known Global Earliest Deadline First (G-EDF) algorithm, as long as its laxity – the amount of time from the earliest possible time of job completion to the deadline of job – is above a certain value. The priority is however promoted to the highest level once the laxity falls below this certain value in order to meet the deadline. Priority promotions are aligned with arrivals and completions of jobs under EDCL to avoid additional scheduler invocations, while EDZL can promote priorities arbitrarily. As compared with EDZL, EDCL reduces runtime overhead and implementation cost, but still strictly dominates G-EDF in schedulability. Schedulability tests for EDCL are derived through theoretical analysis, and sustainability properties are also considered. Our simulation results demonstrate that EDCL is competitive to EDZL in schedulability with a smaller number of scheduler invocations, and it also outperforms traditional EDF-based algorithms.
    Preview · Article · May 2011 · Journal of Systems Architecture
  • Source
    Hiroyuki Chishiro · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Responsive Multithreaded Processor (RMTP) has the Si-multaneous Multithreading (SMT) architecture with prior-ity for distributed real-time processings, called prioritized SMT architecture. In RMTP, execution efficiencies of tasks executing in threads except the highest priority thread fluc-tuate by multiple combinations of tasks executing simulta-neously. Therefore, it is difficult to guarantee the schedu-lability of tasks. Many real-time scheduling algorithms having only real-time part is not well suited to the prior-itized SMT architecture. Because they cannot make use of the remaining times of threads except the highest pri-ority thread. In contrast, semi-fixed-priority scheduling has an optional part which is a non-real-time part and im-proves the quality of the result so that semi-fixed-priority scheduling is well suited to the prioritized SMT architec-ture. This paper evaluates the performance of semi-fixed-priority scheduling on prioritized SMT processors. Exper-imental evaluations show that semi-fixed-priority schedul-ing is well suited to prioritized SMT processors.
    Preview · Article · Apr 2011
  • Source
    Masakazu Taniguchi · Hiroki Matsutani · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Many-core processor is one of attractive solutions to Cyber-Physical Systems (CPS) that demands high computational power since it can enclose many computational elements into a single physical chip. Network-on-Chip (NoC) that connects the processing cores is the key in terms of the cost, performance, and power in such systems. Although NoCs typically employ simple deterministic routing algorithms in order to reduce the complexity of on-chip routers, such deterministic algorithms do not avoid traffic congestion and thus the network throughput is degraded when the traffic pattern has localities. On the other hand, complex algorithms require large hardware cost and will be a problem for CPS whose hardware cost is limited. In this paper, we propose an adaptive on-chip router with Predictor for Regional Congestion (PRC) in order to improve the network throughput with modest hardware overhead. The proposed PRC routers exchangetheir pastandpredictedfuturecongestioninformation with each other. Then, each router synthesizes its regional congestioninformation based on the local and received information in order to route packets without congestion. The simulationresults show that the proposed routers improve the average throughput by 17.2% compared to a congestion-aware router that employes local information only. The RTL design of the proposed router shows that the area overhead is only 2.6% and additional wiring requirement for each router port is only three.
    Preview · Article · Jan 2011
  • Source
    Hiroyuki Chishiro · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents RT-Est, which is a real-time operating system for semi-fixed-priority scheduling algorithms. RT-Est implements the following mechanisms: (i) the hybrid O(1) scheduler, which is an extension of the O(1) scheduler in Linux kernel 2.6, to achieve semi-fixed-priority scheduling with low overhead, (ii) the high resolution timer, which performs to terminate optional parts at optional deadlines, (iii) SIM, which is an architecture for simulating real-time scheduling. Experimental evaluations show that semi-fixed-priority scheduling is well suited to autonomous mobile robots.
    Preview · Conference Paper · Jan 2011
  • Source
    Hiroyuki Chishiro · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: Current real-time systems such as robots have mul- tiprocessors and the number of processors tends to be increased. In order to achieve these real-time systems, global real-time scheduling has been required. Many real-time scheduling algo- rithms are usually based on Liu and Layland's model. Compared to Liu and Layland's model, the imprecise computation model is one of the techniques to overcome the gap between theory and practice. Semi-fixed-priority scheduling is part-level fixed- priority scheduling in the extended imprecise computation model, which has a second mandatory part to terminate an optional part. Unfortunately, current semi-fixed-priority scheduling is only adapted to uniprocessors. This paper presents a global semi- fixed-priority scheduling algorithm, called Global Rate Mono- tonic with Wind-up Part (G-RMWP). G-RMWP calculates the optional deadline, the termination time of each optional part, by Response Time Analysis for Global Rate Monotonic (G-RM). The schedulability analysis shows that one task set is schedulable by G-RMWP if the task set is schedulable by G-RM. Simulation results show that G-RMWP has higher schedulability than G- RM.
    Preview · Article · Jan 2011
  • Source
    Hiroyuki Chishiro · Akira Takeda · Kenji Funaoka · Nobuyuki Yamasaki
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes semi-fixed-priority scheduling to achieve both low-jitter and high schedulability. Semi-fixed-priority scheduling is for the extended imprecise computation model, which has a wind-up part as a second mandatory part and schedules the part of each extended imprecise task with fixed-priority. This paper also proposes a novel semi-fixed-priority scheduling algorithm based on Rate Monotonic (RM), called Rate Monotonic with Wind-up Part (RMWP). RMWP limits executable ranges of wind-up parts to minimize jitter. The schedulability analysis proves that one task set is feasible by RMWP if the task set is feasible by RM. Simulation results show that RMWP has both lower jitter and higher schedulability than RM.
    Preview · Conference Paper · Jan 2010