Article

DyUnS: Dynamic and uncertainty-aware task scheduling for multiprocessor embedded systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... One of the earliest related research works on task scheduling was proposed by Liu and Layland [7], whose periodic task model serves as the basis for many real-time task models and has been extended to multi-processor environments for task scheduling feasibility analysis and algorithm design. Subsequently, researchers have conducted a series of studies on multiprocessor task scheduling [8] [9] [10] [11]. Multiprocessor scheduling is typically divided into partition scheduling and global scheduling [12]. ...
... Each partitioned task set is scheduled using EDF on its corresponding core, which means that after completing task allocation based on the decision results of reinforcement learning, for each partitioned task set, when one task is completed and the next task starts, EDF is used on its corresponding core for task sorting. Then, the performance requirements of each core are determined by running the Cycle-conserving algorithm separately on each core, and frequency adjustments are made according to Eq. (11). ...
Article
Full-text available
In recent years, with the increasing demand for computing power from various intelligent applications on mobile devices, heterogeneous multi-core architectures have received more attention. The big.LITTLE architecture, as a new type of energy-saving architecture, consists of high-performance cores and low energy consumption cores, which can better meet the real-time task requirements of various mobile devices. One of the challenges faced by these scheduling algorithms is how to fully utilize the advantages of big.LITTLE architectures to achieve a balance between system energy consumption and performance. To address this issue, we designed an initial task allocation strategy based on the characteristics of big.LITTLE architecture, introduced reinforcement learning algorithms for decision-making, and assigned different types of tasks to their appropriate processors. In order to balance task load and reduce total scheduling time, we proposed a dynamic migration algorithm, which dynamically adjusts task load by migrating them in the case of imbalanced utilization. Experimental results demonstrate that the proposed algorithm effectively balances energy consumption and performance, improves resource utilization, and reduces both scheduling time and energy consumption.
... Embedded systems based on computer technology can adapt to specific application requirements by flexibly configuring software and hardware [1]. The progress of microprocessor technology, especially the emergence of highperformance and low-power embedded microprocessors such as ARM and MIPS, has promoted the rapid development of embedded systems [2]. ...
... Such systems are widely used in the Internet of Things, industrial control and medical equipment, supporting real-time and non-real-time task execution. The data input and network structure of the embedded system are shown in Equation (1). Where x t represents the eigenvector, x t,i represents the eigenvalue, and n represents the input dimension. ...
Article
Full-text available
With the popularization of smart devices and networked devices, edge device security issues have become increasingly prominent. Traditional security monitoring systems often rely on centralized data processing mode, which is difficult to meet the current real-time analysis requirements of massive data. In order to solve this problem, this paper designs a network edge device security monitoring system based on the fusion of embedded system and bidirectional long-short-term memory network. By deploying the Bi-LSTM model through the embedded processor, the system can detect the abnormal behavior of edge devices in real time, thereby improving the response speed and accuracy of security monitoring. This paper conducts experimental analysis on the actual network traffic data set, collects security data from different types of edge devices, covering device types including smart routers, IoT sensors, etc., and processes more than 100GB of network traffic data in total. The experimental results show that the detection accuracy of the Bi-LSTM model in network attack behavior reaches 96.8%, which is about 4.2% and 5.5% higher than the traditional random forest and support vector machine models respectively. In addition, the real-time analysis of the system shows that the average processing latency of the embedded system is less than 200 ms, which meets the low latency requirement in edge computing environment.
Article
Deep learning methods have shown outstanding results in various applications. Still, they suffer from time-consuming training and inference due to multiple cascade layers, along with the need for a complete retraining process when encountering new data. Broad Learning Systems (BLS) are novel lightweight emerging learning methods that have been successfully applied to different problems, considered as a paradigm shift. In this paper, a correlated Fuzzy version of BLS (CorFBLS) is proposed that considers the local correlations among input variables in defining each fuzzy rule in either neuro-fuzzy subsystem. CorFBLS transfers the input variables to a new space with orthogonal features to define each fuzzy rule and fuzzy set in this extracted feature space. Random subsets of the training instances are used to construct each neuro-fuzzy subsystem. To decrease the number of calculations along with the number of required fuzzy rules, a random subset of input variables is selected to build each neuro-fuzzy subsystem, making it suitable for high-dimensional problems. Finally, to have a self-organizing system able to encounter new training instances, a new growing algorithm is proposed to add new fuzzy subsystems. The performance of the proposed method based on different metrics including precision, architecture size, memory usage, and processing time is studied and compared with some relevant methods in different high-dimensional and large-scale problems along with some small-scale real-world problems. Based on the comparisons, the proposed method outperforms the previous studies with a more parsimonious structure. For high-dimensional datasets, it achieves total accuracy of 95.32% in MNIST, 91.00% in FMNIST, 70.00% in CIFAR-10, 91.06% in NORB, 85.4% in NoduleMedMNIST3D, 81.27% in AdrenalMedMNIST3D, and 91.6% in VesselMedMNIST3D.
Article
Full-text available
In this paper, an interpretable classifier for Parkinson’s Disease (PD) diagnosis based on analyzing the gait cycle is presented. The proposed method utilizes clinical features extracted from the vertical Ground Reaction Force (vGRF) measured by wearable sensors. Therefore, experts can verify the decision made by the proposed method. Type-2 fuzzy logic is applied to increase the robustness against noisy sensor data. First, the initial fuzzy rules are extracted using a K-Nearest-Neighbor-based clustering approach. Next, a novel quasi-Levenberg-Marquardt (qLM) learning approach is proposed and applied to fine-tune the initial rules based on minimizing the cross-entropy loss function using a trust-region optimization method. Finally, complementary online learning is proposed to improve rules by encountering new labeled samples. The performance of the method is evaluated to classify patients and healthy subjects in different conditions, including the presence of noise or observing new samples. Moreover, the performance of the model is compared to some previous supervised and unsupervised machine learning approaches. The final Accuracy, Precision, Recall, and F1 Score of the proposed method are 97.61%, 97.58%, 99.02%, and 98.30%.
Article
Full-text available
Increasing manufacturing process variations due to aggressive technology scaling in addition to heterogeneity in design components are expected to cause serious challenges for future embedded system design steps including task scheduling. Process variation effects along with increased complexity in embedded applications result in design uncertainties, which in turn, reduce the accuracy and efficiency of traditional design approaches with deterministic values for the design component parameters. In this paper, a multi-objective task scheduling framework is proposed for embedded systems considering uncertainties in both hardware and software component parameters. The tasks which are modeled as a task graph are scheduled on a specific hardware platform consisting of processors and communication parts. Uncertainty is considered in both software (task parameters) and hardware (processor and communication parameters) of the embedded system. UMOTS takes advantages of a Monte-Carlo-based approach within a multi-objective genetic algorithm to handle the uncertainties in model parameters. The proposed approach finds the Pareto frontier, which is robust against uncertainties, in the objective space formed by performance, energy consumption, and reliability. The efficiency of UMOTS is investigated in the experimental results using real-application task graphs. In terms of Scheduling Length Ratio (SLR) and speedup, UMOTS provides 27.8% and 28.6% performance improvements in comparison to HSHD, one state-of-the-art task scheduling algorithm. Additionally, UMOTS, which is based on a multi-objective genetic optimization algorithms, finds robust Pareto frontier with 1%, 5% and 10% uncertainty in design indicators with respect to design limitations.
Article
Full-text available
Task schedule optimization enables to attain high performance in both homogeneous and heterogeneous computing environments. The primary objective of task scheduling is to minimize the execution time of an application graph. However, this is an NP-complete (non-deterministic polynomial) undertaking. Additionally, task scheduling is a challenging problem due to the heterogeneity in the modern computing systems in terms of both computation and communication costs. An application can be considered as a task graph represented using Directed Acyclic Graphs (DAG). Due to the heterogeneous system, each task has different execution time on different processors. The primary concern in this problem domain is to reduce the schedule length with minimum complexity of the scheduling procedure. This work presents a couple of hybrid heuristics, based on a list and guided random search to address this concern. The proposed heuristic, i.e., Hybrid Heuristic and Genetic-based Task Scheduling Algorithm for Heterogeneous Computing (HHG) uses Genetic Algorithm and a list-based approach. This work also presents another heuristic, namely, Hybrid Task Duplication, and Genetic-based Task Scheduling Algorithm for Heterogeneous Computing (HTDG). The present work improves the quality of initial GA population by inducing two diverse guided chromosomes. The proposal is compared with four state-of-the-art methods, including two evolutionary algorithms for the same task, i.e., New Genetic Algorithm (NGA) and Enhanced Genetic Algorithm for Task Scheduling (EGA-TS), and two list-based algorithms, i.e., Heterogeneous Earliest Finish Time (HEFT), and Predict Earliest Finish Time (PEFT). Results show that the proposed solution performs better than its counterparts based on occurrences of the best result, average makespan, average schedule length ratio, average speedup, and the average running time. HTDG yields 89% better results and HHG demonstrates 56% better results in comparisons to the four state-of-the-art task scheduling algorithms.
Article
Full-text available
Embedded systems used in critical systems, such as aeronautics, have undergone continuous evolution in recent years. In this evolution, many of the functionalities offered by these systems have been adapted through the introduction of network services that achieve high levels of interconnectivity. The high availability of access to communications networks has enabled the development of new applications that introduce control functions with higher levels of intelligence and adaptation. In these applications, it is necessary to manage different components of an application according to their levels of criticality. The concept of “Industry 4.0” has recently emerged to describe high levels of automation and flexibility in production. The digitization and extensive use of information technologies has become the key to industrial systems. Due to their growing importance and social impact, industrial systems have become part of the systems that are considered critical. This evolution of industrial systems forces the appearance of new technical requirements for software architectures that enable the consolidation of multiple applications in common hardware platforms—including those of different criticality levels. These enabling technologies, together with use of reference models and standardization facilitate the effective transition to this approach. This article analyses the structure of Industry 4.0 systems providing a comprehensive review of existing techniques. The levels and mechanisms of interaction between components are analyzed while considering the impact that the handling of multiple levels of criticality has on the architecture itself—and on the functionalities of the support middleware. Finally, this paper outcomes some of the challenges from a technological and research point of view that the authors identify as crucial for the successful development of these technologies.
Article
Full-text available
The emergent technology of Multi-Processor System-on-Chip (MPSoC), which combines heterogeneous computing with the high performance of Field Programmable Gate Arrays (FPGAs) is a very interesting platform for a huge number of applications ranging from medical imaging and augmented reality to high-performance computing in space. In this paper, we focus on the Xilinx Zynq UltraScale+ EG Heterogeneous MPSoC, which is composed of four different processing elements (PE): a dual-core Cortex-R5, a quad-core ARM Cortex-A53, a graphics processing unit (GPU) and a high end FPGA. Proper use of the heterogeneity and the different levels of parallelism of this platform becomes a challenging task. This paper evaluates this platform and each of its PEs to carry out fundamental operations in terms of computational performance. To this end, we evaluate image-based applications and a matrix multiplication kernel. On former, the image-based applications leverage the heterogeneity of the MPSoc and strategically distributes its tasks among both kinds of CPU cores and the FPGA. On the latter, we analyze separately each PE using different matrix multiplication benchmarks in order to assess and compare their performance in terms of MFlops. This kind of operations are being carried out for example in a large number of space-related applications where the MPSoCs are currently gaining momentum. Results stand out the fact that different PEs can collaborate efficiently with the aim of accelerating the computational-demanding tasks of an application. Another important aspect to highlight is that leveraging the parallel OpenBLAS library we achieve up to 12 GFlops with the four Cortex-A53 cores of the platform, which is a considerable performance for this kind of devices.
Article
Full-text available
Energy consumption and elevating the chip temperature become a serious challenge in designing embedded systems mainly due to transistor scaling and integration of more components into a single chip. The temperature of the chip has significant effects on leakage current, energy consumption and reliability of the chip; hence, discovering a mechanism that reduces both energy consumption and temperature of the chip is of utmost significance. An effective task scheduling in a real-time multiprocessor system-on-chip system has a direct impact on energy consumption and temperature of the chip. Several task scheduling and task assignment techniques have been proposed to achieve this goal. Most of those works consider only utilization of processors to distribute tasks among processors in order to reduce energy consumption. Meanwhile, those works try to reduce temperature in the step of task scheduling, separately in each processor. This paper proposes a fuzzy-based technique to distribute real-time tasks among processors in order to reduce both temperature and energy consumption simultaneously. Simulation results show that our proposed technique is more efficient in terms of reducing the energy consumption compared to a well-known state-of-the-art method (up to 9%) while offering a more balanced and moderate temperatures for processors and hindering hot spot.
Article
Full-text available
In this paper, a hybrid scheduling and mapping approach to jointly optimize performance, lifetime reliability, energy consumption and temperature of heterogeneous multiprocessor systems on chip (MPSoCs), called “HYSTERY,” is proposed. Due to the growth of dynamic behavior in modern applications of embedded systems, along with necessity of performing complicated computations to jointly optimize the main design challenges of MPSoCs, we propose a hybrid scheduling approach in this paper. The proposed approach deals with the optimization of the mentioned design challenges at the design-time through solving an optimization problem and considering load balancing in task assignment. Moreover, at the runtime, the derived static solution is applied to the system and the design metrics monitored periodically and controlled, if required, to adapt the static scheduling decisions at the runtime. Several experiments with synthetic and real-life applications demonstrate that the proposed approach can effectively optimize the design challenges and manage dynamism of execution environment. In comparison with the uncontrolled runtime scheduling approach, HYSTERY shows 20% improvement in temperature averagely, which subsequently enhance lifetime reliability and power consumption. Furthermore, HYSTERY improves the main design parameters of MPSOCs about 21% in average compared to the existing scheduling approaches.
Article
Full-text available
Type-reduction of type-2 fuzzy sets is considered to be a defuzzification bottleneck because of the computational complexity involved in the process of type-reduction. In this research, we prove that the closed-form Nie-Tan operator, which outputs the average of the upper and lower bounds of the footprint of uncertainty, is actually an accurate method for defuzzifing interval type-2 fuzzy sets.
Article
Full-text available
In this paper, we describe a graphic editing tool called QUILT (Quick Utility for Integrated circuit Layout and Temperature modeling). QUILT permits users to rapidly build floorplans of integrated circuits, providing both a visual aid as well as an input to the HotSpot simulator. The tool provides numerous features for estimating circuit performance, such as interconnect delay, and for generating graphical images for publications. As a graphical and easy to use tool, QUILT is well suited for both research and coursework purposes.
Article
Full-text available
This paper presents HotSpot-a modeling methodology for developing compact thermal models based on the popular stacked-layer packaging scheme in modern very large-scale integration systems. In addition to modeling silicon and packaging layers, HotSpot includes a high-level on-chip interconnect self-heating power and thermal model such that the thermal impacts on interconnects can also be considered during early design stages. The HotSpot compact thermal modeling approach is especially well suited for preregister transfer level (RTL) and presynthesis thermal analysis and is able to provide detailed static and transient temperature information across the die and the package, as it is also computationally efficient.
Conference Paper
Full-text available
For autonomous critical real-time embedded (e.g., satellite), guaranteeing a very high level of reliability is as important as keeping the power consumption as low as possible. We propose an off-line scheduling heuristic which, from a given software application graph and a given multiprocessor architecture (homogeneous and fully connected), produces a static multiprocessor schedule that optimizes three criteria: its length (crucial for real-time systems), its reliability (crucial for dependable systems), and its power consumption (crucial for autonomous systems). Our tricriteria scheduling heuristic, called TSH, uses the active replication of the operations and the data-dependencies to increase the reliability and uses dynamic voltage and frequency scaling to lower the power consumption. We demonstrate the soundness of TSH. We also provide extensive simulation results to show how TSH behaves in practice: first, we run TSH on a single instance to provide the whole Pareto front in 3D; second, we compare TSH versus the ECS heuristic (Energy-Conscious Scheduling) from the literature; and third, we compare TSH versus an optimal Mixed Linear Integer Program.
Conference Paper
Full-text available
This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehen- sive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multipro- cessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory con- trollers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing mod- eling, area modeling, and dynamic, short-circuit, and leak- age power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double- gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay- area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clus- tering will bring interesting tradeoffs between area and per- formance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache shar- ing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out- of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into ac- count configuring clusters with 4 cores gives the best EDA2P and EDAP.
Conference Paper
Full-text available
This paper presents and characterizes the Princeton Ap- plication Repository for Shared-Memory Computers (PAR- SEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiproces- sors have focused on high-performance computing applica- tions and used a limited number of synchronization meth- ods. PARSEC includes emerging applications in recogni- tion, mining and synthesis (RMS) as well as systems appli- cations which mimic large-scale multi-threaded commercial programs. Our characterization shows that the benchmark suite is diverse in working set, locality, data sharing, syn- chronization, and o-chip trac. The benchmark suite has been made available to the public.
Article
Full-text available
The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86). The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Conference Paper
Full-text available
This paper examines a set of commercially representative embedded programs and compares them to an existing benchmark suite, SPEC2000. A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors. Several characteristics distinguish the representative embedded programs from the existing SPEC benchmarks including instruction distribution, memory behavior, and available parallelism. The embedded benchmarks, called MiBench, are freely available to all researchers.
Article
Domain-specific systems-on-chip (DSSoCs) combine general-purpose processors and specialized hardware accelerators to improve performance and energy efficiency for a specific domain. The optimal allocation of tasks to processing elements (PEs) with minimal runtime overheads is crucial to achieving this potential. However, this problem remains challenging as prior approaches suffer from non-optimal scheduling decisions or significant runtime overheads. Moreover, existing techniques focus on a single optimization objective, such as maximizing performance. This work proposes DTRL, a decision-tree-based multi-objective reinforcement learning technique for runtime task scheduling in DSSoCs. DTRL trains a single global differentiable decision tree (DDT) policy that covers the entire objective space quantified by a preference vector. Our extensive experimental evaluations using our novel reinforcement learning environment demonstrate that DTRL captures the trade-off between execution time and power consumption, thereby generating a Pareto set of solutions using a single policy. Furthermore, comparison with state-of-the-art heuristic–, optimization–, and machine learning-based schedulers shows that DTRL achieves up to 9× higher performance and up to 3.08× reduction in energy consumption. The trained DDT policy achieves 120 ns inference latency on Xilinx Zynq ZCU102 FPGA at 1.2 GHz, resulting in negligible runtime overheads. Evaluation on the same hardware shows that DTRL achieves up to 16% higher performance than a state-of-the-art heuristic scheduler.
Article
In this paper, an evolutionary-neuro-fuzzy-based task scheduling approach (ENF-S) to jointly optimize the main critical parameters of heterogeneous multi-core systems is proposed. This approach has two phases: first, the fuzzy neural network (FNN) is trained using a non-dominated sorting genetic algorithm (NSGA-II), considering the critical parameters of heterogeneous multi-core systems on a training data set consisting of different application graphs. These critical parameters are execution time, temperature, failure rate, and power consumption. The output of the trained FNN determines the criticality degree for various processing cores based on the system's current state. Next, the trained FNN is employed as an online scheduler to jointly optimize the critical objectives of multi-core systems at runtime. Due to the uncertainty in sensor measurements and the difference between computational models and reality, applying the fuzzy neural network is advantageous. The efficiency of ENF-S is investigated in various aspects including its joint optimization capability, appropriateness of generated fuzzy rules, comparison with related research, and its overhead analysis through several experiments on real-world and synthetic application graphs. Based on these experiments, our ENF-S outperforms the related studies in optimizing all design criteria. Its improvements over related methods are estimated 19.21%19.21\% in execution time, 13.07%13.07\% in temperature, 25.09%25.09\% in failure rate, and 13.16%13.16\% in power consumption, averagely.
Article
Vehicular cloud computing is a hopeful solution to utilize underused vehicle resources such as processing energy, storage space, Internet connection, etc. These resources can be shared among vehicles or rented by landlords for multiple purposes, such as meeting the hardware needs of automotive network services and applications. It is possible to meet the growing need for resources in the automotive network. Although this plan seems possible, its implementation has some problems. Several scholars have concentrated on architectural design to solve various difficulties and provide users with trustworthy service. This paper presents a fuzzy-based method to allocate resources in vehicular cloud computing using a nature-inspired (cuckoo search algorithm). The suggested algorithm is compared to some state-of-the-art algorithms. The outcomes illustrated that the recommended method outperforms other algorithms in terms of execution time, delay, and makespan.
Article
In addition to meeting the real-time constraint, power/energy efficiency and high reliability are two vital objectives for real-time embedded systems. Recently, heterogeneous multicore systems have been considered an appropriate solution for achieving joint power/energy efficiency and high reliability. However, power/energy and reliability are two conflict requirements due to the inherent redundancy of fault-tolerance techniques. Also, because of the heterogeneity of the system, the execution of the tasks, especially real-time tasks, in the heterogeneous system is more complicated than the homogeneous system. The proposed method in this paper employs a passive primary/backup technique to preserve the reliability requirement of the system at a satisfactory level and reduces power/energy consumption in heterogeneous multicore systems by considering real-time and peak power constraints. The proposed method attempts to map the primary and backup tasks in a mixed manner to benefit from the execution of the tasks in different core types and schedules the backup tasks after finishing the primary tasks to remove the overlap between the execution of the primary and backup tasks. Compared to the existing state-of-the-art methods, experimental results demonstrate our proposed method's power efficiency and effectiveness in terms of schedulability.
Article
In this paper, a new interval type-2 fuzzy neural network able to construct non-separable fuzzy rules with various shapes is introduced for function approximation problems. To reflect the uncertainty, the shape of fuzzy sets is considered to be uncertain. Therefore, a new form of shapeable interval type-2 fuzzy sets based on a general Gaussian model able to construct different shapes (including triangular, bell-shaped, trapezoidal) is proposed. To consider the interactions among input variables, input vectors are transformed to new feature spaces with uncorrelated variables proper for defining each fuzzy rule. Next, the new features are fed to a fuzzification layer using proposed interval type-2 fuzzy sets with adaptive shapes. Consequently, interval type-2 non-separable fuzzy rules with proper shapes, considering the local interactions of variables and the uncertainty are formed. For type reduction, the contribution of the upper and lower firing strengths of each fuzzy rule is adaptively selected separately. To train different parameters of the network, the Levenberg–Marquardt optimization method is utilized. The performance of the proposed method is investigated on clean and noisy datasets to show the ability to consider the uncertainty. Moreover, the proposed paradigm is successfully applied to real-world time-series predictions, regression problems, and nonlinear system identification. According to the experimental results, the performance of our proposed model outperforms other methods with a more parsimonious structure. Based on several experiments, the test RMSE of the proposed method is equal to 0.0243 for noisy McGlass time series prediction, 1.92 for Santa-Fe Laser prediction, 0.0301 for Box-Jenkins system identification, 0.0569 for Poland electricity load forecasting, 4.22 for Google stock price tracking, and 13.22 for Sydney stock price tracking.
Article
Internet-of-Things (IoT) is an appealing service to revolutionise Smart City (SC) initiatives across the globe. IoT interconnects a plethora of digital devices known as Sensor Nodes (SNs) to the Internet. Due to their high performance and exceptional Quality-of-Service (QoS) Multiprocessor System-on-Chip (MPSoC) computing architectures are gaining increasing popularity for the computationally extensive workloads in both IoT and consumer electronics. In this survey, we have explored balance between the IoT paradigm and its applications in SC while introducing Wireless Sensor Network (WSN), including the structure of the SN. We considered MPSoCs systems in relation to characteristics such as architecture and the communication technology involved. This provides an insight into the benefits of coupling MPSoCs with IoT. This paper, also investigates prevalent software level energy optimisation techniques and extensively reviews workload mapping and scheduling approaches since 2001 until today for energy savings using (1) Dynamic Voltage and Frequency Scaling (DVFS) and/or Dynamic Power Management (DPM) (2) Inter-processor communication reduction (3) Coarse-grained software pipelining integrated with DVFS. This paper constructively summarises the findings of these approaches and algorithms identifying insightful directions to future research avenues.
Article
The Industry 4.0 literature has exponentially grown in the past decade. We aim to understand how this literature has evolved and propose future research opportunities. We focus on four smart dimensions of Industry 4.0: Smart Manufacturing, Smart Products and Services, Smart Supply Chain, and Smart Working. We perform a machine learning-based systematic literature review. Our analysis included 4,973 papers published from 2011 to 2020. We conducted a chronological network analysis considering the growth of these four dimensions and the connections between them. We also analyzed keywords and the main journals publishing on these four smart dimensions. We show that the literature has mainly been devoted to the study of Smart Manufacturing, although attention to the other smart dimensions has been growing in recent years. Smart Working is the less explored dimension, with many opportunities for future research. We show that research opportunities are concentrated in the interfaces between the different smart dimensions. Our findings support the vision of Industry 4.0 as a concept transcending the Smart Manufacturing field, thus creating opportunities for synergies with other related fields. Scholars can use our findings to understand the orientation of journals and gaps that can be fulfilled by future research.
Article
For real-time embedded systems, energy management and fault tolerance are both critical. However these two objectives are often at odds, because extra resources needed to tolerate faults significantly increase the energy consumption. In this paper, we consider energy-aware and fault-tolerant scheduling of periodic real-time tasks. Our target platform is a heterogeneous multicore system. We reduce the energy consumption by both applying DVFS to scale the primary tasks, and maximizing the opportunities to cancel the back-up tasks in fault-free execution scenarios. To tolerate both transient and permanent faults, primary and backup copies of tasks are scheduled on different cores. Our framework consists of offline and online phases to manage energy and fault-tolerant scheduling of periodic tasks in tandem. The latter objective is achieved through an explicit task priority assignment phase, coupled with a dual queue based back-up delaying algorithm. In particular, we propose a scheme called Reverse Preference-Oriented Priority Assignment (RPPA) which is experimentally shown to be very effective to reduce the energy consumption. RPPA, when coupled with the dual-queue based delaying algorithm, outperforms other schemes and approaches the energy performance of a theoretical lower bound. All the proposed schemes satisfy the stringent timing and fault tolerance requirements of periodic real-time tasks while managing the energy consumption dynamically.
Article
Heterogeneous multiprocessor system-on-chips (MPSoCs) are suitable platforms for real-time embedded applications that require powerful parallel processing capability as well as low power consumption. For such applications, soft-error reliability (SER) due to transient faults and lifetime reliability (LTR) due to permanent faults are both key concerns. There have been several efforts in the literature oriented toward related reliability problems. However, most existing techniques only concentrate on improving one of the two reliability metrics, which are not suitable for embedded systems deployed in critical applications in need of a long lifetime as well as a reliable execution. This article develops a novel heterogeneous earliest-finish-time (HEFT)-based algorithm to maximize SER and LTR simultaneously under the real-time constraint for dependent tasks executing on heterogeneous MPSoC systems. More specifically, a new deadline-constrained reliability-aware HEFT algorithm, namely DRHEFT, is proposed, which seeks for the best SER–LTR tradeoff solutions through using fuzzy dominance to evaluate the relative fitness values of candidate solutions. The extensive experiments on real-life benchmarks as well as synthetic applications demonstrate that DRHEFT is capable of achieving better SER–LTR tradeoff solutions with higher hypervolume and less computation cost when compared with the state-of-the-art approaches.
Article
Most of the scheduling algorithms proposed for real-time embedded systems, with energy constraints, try to reduce power consumption. However, reducing the power consumption may decrease the computation speed and impact the makespan. Therefore, for real-time embedded systems, makespan and power consumption need to be considered simultaneously. Since task scheduling is an NP-hard problem, most of the proposed scheduling algorithms are not able to find the multi-objective optimal solution. In this paper, we propose a two-phase hybrid task scheduling algorithm based on decomposition of the input task graph, by applying spectral partitioning. The proposed algorithm, called G-SP, assigns each part of the task graph to a low power processor in order to minimize power consumption. Through experiments, we compare the makespan and power consumption of the G-SP against well-known algorithms of this area for a large set of randomly generated and real-world task graphs with different characteristics. The obtained results show that the G-SP outperforms other algorithms in both metrics, under various conditions, involving different numbers of processors and considering several system configurations.
Article
Industrial systems usually draw huge energy to run various machines. The amount of energy requirement has again increased due to the automation of the industrial plants to make them Industry 4.0 compliant. As a result, demand of energy is on the rise in almost all manufacturing and industrial plants. The necessity of critical and smart manufacturing processes in Industry 4.0 and its increased energy requirements force us to look for energy efficient techniques for running the deployed computing systems, which are often embedded and integrated within larger machines and have to function under time constraints. Computational efficiency of these real-time embedded systems (RTESs) depends solely on the timely completion of tasks. Task execution with less energy consumption within critical timing constraints is a challenging issue for the designers of RTESs. Thus, task scheduling in these systems require sophisticated energy efficient mechanisms. However, energy efficiency and timeliness are two mutually contradictory objectives, since the former is achieved only with a significant compromise of the later. In this paper, we propose a novel approach, based on the popular multi-objective evolutionary algorithm, Non-dominated sorting genetic algorithm-II, to solve this problem. Moreover, in RTESs, precise prediction of timing constraints is difficult before runtime which causes a form of imprecision or uncertainty in the system. Therefore, we use type-2 fuzzy sets (T2 FSs) to model the timing constraints in RTESs and introduce novel algorithms for membership function generation and calculation of fuzzy earliness. Numerical as well as real-life examples are included to demonstrate our proposed technique.
Article
The task-to-core scheduling problem for allocating tasks graphs using Dynamic Voltage and Frequency Scaling (DVFS) for achieving three objectives of performance, energy, and temperature (PET), poses algorithmic challenges as it involves conflicting goals and trade-offs. A myriad of static algorithms have been proposed for solving the problem. These algorithms can be roughly categorized into three groups: techniques that generate optimal solutions (for smaller sizes problems) for one objective, techinques that generate Pareto-optimal solutions, and fast heuristics. All of these techniques generate multi-dimensional results that are hardly intelligible. The assessment of these results requires new comparison methods and concise evaluation measures. This paper proposes a framework, evaluation procedures, and a set of benchmarks for methodical comparisons of various algorithms for solving the PET-aware task-to-core scheduling problem. The proposed performance measures assist in judiciously comparing these different algorithms and analyzing their results on a unified basis. The goal is also to seek answers as to how good the Pareto-optimal algorithms are compared to fast heuristics in tackling the same problem with the same assumptions. At the same time, we are interested in knowing how good Pareto-optimal and heuristics are when compared to the absolute optimal (at least for small sets of problems). The algorithms are multi-objective in nature and the problem and system parameters are numerous, which makes the comparison hard. Thus, the proposed framework in evaluating trade-offs and determining which application and target parameters affect the results (performance, energy consumed and temperature achieved) of these algorithms is highly useful. Extensive experimentation facilitates a comprehensive comparison of different kinds of algorithms among themselves as well as with optimal solutions obtained through Integer Linear Programming as a reference.
Article
We investigate multi-criteria optimization and Pareto front generation. Given an application modeled as a Directed Acyclic Graph (DAG) of tasks and a multicore architecture, we produce a set of non-dominated (in the Pareto sense) static schedules of this DAG onto this multicore. The criteria we address are the execution time, reliability, power consumption, and peak temperature. These criteria exhibit complex antagonistic relations, which make the problem challenging. For instance, improving the reliability requires adding some redundancy in the schedule, which penalizes the execution time. To produce Pareto fronts in this 4-dimension space, we transform three of the four criteria into constraints (the reliability, the power consumption, and the peak temperature), and we minimize the fourth one (the execution time of the schedule) under these three constraints. By varying the thresholds used for the three constraints, we are able to produce a Pareto front of non-dominated solutions. Each Pareto optimum is a static schedule of the DAG onto the multicore. We propose two algorithms to compute static schedules. The first is a ready list scheduling heuristic called ERPOT (Execution time, Reliability, POwer consumption and Temperature). ERPOT actively replicates the tasks to increase the reliability, uses Dynamic Voltage and Frequency Scaling to decrease the power consumption, and inserts cooling times to control the peak temperature. The second algorithm uses an Integer Linear Programming (ILP) program to compute an optimal schedule. However, because our multi-criteria scheduling problem is NP-complete, the ILP algorithm is limited to very small problem instances. Comparisons showed that the schedules produced by ERPOT are on average only 10% worse than the optimal schedules computed by the ILP program, and that ERPOT outperforms the PowerPerf-PET heuristic from the literature on average by 33%.
Article
In this paper, we proposed a meta heuristic-based task scheduling method to optimize lifetime reliability, performance and power consumption of heterogeneous MPSoCs. Lifetime reliability is affected by several failure mechanisms with different behaviors which are mainly dependent on temperature and its variation pattern. To improve lifetime reliability of multiprocessor systems, it is required to consider the effect of all potential failures and their distinct impact during the optimization process. Moreover, improving power consumption and execution time makes the optimization process more complicated due to the existing trade offs among these parameters. Our proposed task scheduling method optimizes lifetime reliability by considering the effect of all failure mechanisms, power consumption and execution time of heterogeneous MPSoCs. It employs a design space exploration engine based on the Non-dominated Sorting Genetic Algorithm (NSGA-II) to make the exploration process more efficient. To demonstrate the effectiveness of our proposed task scheduling and mapping method and compare it to the related studies, several experiments are performed. Moreover, the importance of thermal cycling (TC), as an emerging thermal concern in computing the lifetime reliability of MPSoCs, and also the capability of our proposed method in controlling it are studied and compared to related research. Experimental results show that employing our proposed scheduling method improves performance, lifetime reliability and power consumption about 24%, 30% and 3.6% respectively on average compared to two selected related studies. Furthermore, our proposed approach decreases the occurrence rate of all failure mechanisms compared to related studies and outperforms them in term of the thermal cycling rate about 48% on average.
Article
The integration of information and communication 1 technologies in traditional grid brings about a smart grid. 2 Energy management plays a vital role in maintaining the 3 sustainability and reliability of a smart grid which in turn helps 4 to prevent blackouts. Energy management at consumers side 5 is a complex task, it requires efficient scheduling of appliances 6 with minimum delay to reduce peak-to-average ratio (PAR) 7 and energy consumption cost. In this paper, the classification 8 of appliances is introduced based on their energy consumption 9 pattern. An energy management controller is developed for 10 demand side management. We have used fuzzy logic and 11 heuristic optimization techniques for cost, energy consump-12 tion and PAR reduction. Fuzzy logic is used to control the 13 throttleable and interruptible appliances. On the other hand, 14 the heuristic optimization algorithms, BAT inspired and flower 15 pollination, are employed for scheduling of shiftable appliances. 16 We have also proposed a hybrid optimization algorithm for 17 the scheduling of home appliances, named as hybrid BAT 18 pollination optimization algorithm. Simulation results show a 19 significant reduction in energy consumption, cost and PAR. 20
Article
Heterogeneous MPSoCs consisting of cores with different performance/power characteristics are widely used in many real-time embedded systems where both soft-error reliability and lifetime reliability are key concerns. Although existing efforts have investigated related problems, they either focus on one of the two reliability concerns or propose time-consuming scheduling algorithms that cannot adequately address runtime workload and environmental variations. This paper introduces an on-line framework which is adaptive to runtime variations and maximizes soft-error reliability while satisfying the lifetime reliability constraint for soft real-time systems executing on MPSoCs that are composed of high-performance cores and low-power cores. Based on each core’s executing frequency and utilization, the framework performs workload migration between high-performance cores and low-power cores to reduce power consumption and improve soft-error reliability. Experimental results based on different hardware platforms show that the proposed approach reduces the probability of failures due to soft errors by at least 17% and 50% on average compared to a number of representative existing approaches that satisfy the same lifetime reliability constraints.
Conference Paper
Due to technology downscaling, embedded systems have increased in complexity and heterogeneity. Increasingly large process, voltage, and temperature variations negatively affect the design and optimization process of these systems. These factors contribute to increased uncertainties that in turn undermine the accuracy and effectiveness of traditional design approaches. In this paper, we formulate the problem of uncertainty aware mapping for multicore embedded systems as a multi-objective optimization problem. We present a solution to this problem that integrates uncertainty models as a new design methodology constructed with Monte Carlo and evolutionary algorithms. The methodology is uncertainty aware because it is able to model uncertainties in design parameters and to identify robust design solutions that limit the influence of these uncertainties onto the objective functions. The proposed design methodology is implemented as a tool that can generate the robust Pareto frontier in the objective space formed by reliability, performance, and energy consumption.
Book
The second edition of this textbook provides a fully updated approach to fuzzy sets and systems that can model uncertainty — i.e., “type-2” fuzzy sets and systems. The author demonstrates how to overcome the limitations of classical fuzzy sets and systems, enabling a wide range of applications from time-series forecasting to knowledge mining to control. In this new edition, a bottom-up approach is presented that begins by introducing classical (type-1) fuzzy sets and systems, and then explains how they can be modified to handle uncertainty. The author covers fuzzy rule-based systems – from type-1 to interval type-2 to general type-2 – in one volume. For hands-on experience, the book provides information on accessing MatLab and Java software to complement the content. The book features a full suite of classroom material. • Presents fully updated material on new breakthroughs in human-inspired rule-based techniques for handling real-world uncertainties; • Allows those already familiar with type-1 fuzzy sets and systems to rapidly come up to speed to type-2 fuzzy sets and systems; • Features complete classroom material including end-of-chapter exercises, a solutions manual, and three case studies -- forecasting of time series to knowledge mining from surveys and PID control.
Article
This paper studies the problem of maximizing multicore system lifetime reliability, an important design consideration for many real-time embedded systems. Existing work has investigated the problem, but has neglected important failure mechanisms. Furthermore, most existing algorithms are too slow for online use, and thus cannot address runtime workload and environment variations. This paper presents an online framework that maximizes system lifetime reliability through reliability-aware utilization control. It focuses on homogeneous multicore soft real-time systems. It selectively employs a comprehensive reliability estimation tool to deal with a variety of failure mechanisms at the system level. A model-predictive controller adjusts utilization by manipulating core frequencies, thereby reducing temperature, and an online heuristic adjusts the controller sampling window length to decrease the reliability effects of thermal cycling. Experiments with a real quad-core ARM processor and a simulator demonstrate that the proposed approach improves system mean time to failure by 50% on average and 141% in the best case, compared with existing techniques.
Article
The emergence of cyber-physical-social systems (CPSS) as a novel paradigm has revolutionized the relationship between humans, computers and the physical environment. In this paper, we survey the advancement of CPSS through cyber-physical systems (CPS), cyber-social systems (CSS) and CPSS, as well as related techniques. CPSS are still at their infancy, most recent studies are application-specific and lack of systematic design methodology. To exploit the design methodology for CPSS, we review the existing system-level design methodologies in multiple application domains and further compare their performance characteristics and applicability for CPSS. Finally, we introduce our latest research advancement on system-level design methodology for CPSS and summarize future challenges for designing CPSS.
Article
Three-way joint optimization of performance (P), energy (E), and temperature (T) in scheduling parallel tasks to multiple cores poses a challenge that is staggering in its computational complexity. The goal of the PET optimized scheduling (PETOS) problem is to minimize three quantities: the completion time of a task graph, the total energy consumption, and the peak temperature of the system. Algorithms based on conventional multi-objective optimization techniques can be designed for solving the PETOS problem. But their execution times are exceedingly high and hence their applicability is restricted merely to problems of modest size. Exacerbating the problem is the solution space that is typically a Pareto front since no single solution can be strictly best along all three objectives. Thus, not only is the absolute quality of the solutions important but “the spread of the solutions” along each objective and the distribution of solutions within the generated tradeoff front are also desired. A natural alternative is to design efficient heuristic algorithms that can generate good solutions as well as good spreads -- note that most of the prior work in energy-efficient task allocation is predominantly single- or dual-objective oriented. Given a directed acyclic graph (DAG) representing a parallel program, a heuristic encompasses policies as to what tasks should go to what cores and at what frequency should that core operate. Various policies, such as greedy, iterative, and probabilistic, can be employed. However, the choice and usage of these policies can influence a heuristic towards a particular objective and can also profoundly impact its performance. This article proposes 16 heuristics that utilize various methods for task-to-core allocation and frequency selection. This article also presents a methodical classification scheme which not only categorizes the proposed heuristics but can also accommodate additional heuristics. Extensive simulation experiments compare these algorithms while shedding light on their strengths and tradeoffs.
Conference Paper
The timing constraint of tasks in the mobile real-time computing systems plays the central role in deciding the task schedule as timely completion of the task is very important in such systems. These timing constraints are however completely unquantifiable during the time of system modeling and designing. Thus we consider type-2 fuzzy sets for modeling the timing constraints in mobile and time-critical computing systems and propose a new algorithm FT2EDF (Fuzzy Type-2 Earliest Deadline First) for task scheduling. On the other hand, because of the limitation of the storage power, power efficiency is another foremost design objective for designing mobile real-time computing systems. However, reduction of processor power pulls down the system performance. Timely task completion and power efficiency are therefore two mutually conflicting criteria. In this paper, we propose a heuristic based solution approach that with a modified version of the non-dominated sorting genetic algorithm-II (NSGA-II). Our approach allows that a processor dynamically switches between different voltage levels to ensure optimum reduction in the power requirements without compromising the timeliness of the task completion. The efficacy of our approach is demonstrated with two numerical examples. Comparison with the previous results show that our solution ensures approximately 44% of energy saving as compared to the around 25% of the earlier results.
Conference Paper
Energy and reliability optimization are two of the most critical objectives for the synthesis of multiprocessor systems-on-chip (MPSoCs). Task mapping has shown significant promise as a low cost solution in achieving these objectives as standalone or in tandem as well. This paper proposes a multi-objective design space exploration to determine the mapping of tasks of an application on a multiprocessor system and voltage/frequency level of each tasks (exploiting the DVFS capabilities of modern processors) such that the reliability of the platform is improved while fulfilling the energy budget and the performance constraint set by system designers. In this respect, the reliability of a given MPSoC platform incorporates not only the impact of voltage and frequency on the aging of the processors (wear-out effect) but also on the susceptibility to soft-errors - a joint consideration missing in all existing works in this domain. Further, the proposed exploration also incorporates soft-error tolerance by selective replication of tasks, making the proposed approach an interesting blend of reactive and proactive fault-tolerance. The combined objective of minimizing core aging together with the susceptibility to transient faults under a given performance/energy budget is solved by using a multi-objective genetic algorithm exploiting tasks' mapping, DVFS and selective replication as tuning knobs. Experiments conducted with reallife and synthetic application graphs clearly demonstrate the advantage of the proposed approach.
Conference Paper
The reliance on multi/many-core systems to satisfy the high performance requirement of complex embedded software applications is increasing. This necessitates the need to realize efficient mapping methodologies for such complex computing platforms. This paper provides an extensive survey and categorization of state-of-the-art mapping methodologies and highlights the emerging trends for multi/many-core systems. The methodologies aim at optimizing system's resource usage, performance, power consumption, temperature distribution and reliability for varying application models. The methodologies perform design-time and run-time optimization for static and dynamic workload scenarios, respectively. These optimizations are necessary to fulfill the end-user demands. Comparison of the methodologies based on their optimization aim has been provided. The trend followed by the methodologies and open research challenges have also been discussed.
Article
Increasing integrated circuit (IC) power densities and temperatures may hamper multiprocessor system-on-chip (MPSoC) use in hard real-time systems. This paper formalizes the temperature-aware real-time MPSoC assignment and scheduling problem and presents an optimal phased steady-state mixed integer linear programming-based solution that considers the impact of scheduling and assignment decisions on MPSoC thermal profiles to directly minimize the chip peak temperature. We also introduce a flexible heuristic framework for task assignment and scheduling that permits system designers to trade off accuracy for running time when solving large problem instances. Finally, for task sets with sufficient slack, we show that inserting idle times between task executions can further reduce the peak temperature of the MPSoC quite significantly.
Article
Scheduling periodic tasks onto a multiprocessor architecture under several constraints such as performance, cost, energy, and reliability is a major challenge in embedded systems. In this paper, we present an Integer Linear Programming (ILP) based framework that maps a given task set onto an Heterogeneous Multiprocessor System-on-Chip (HMPSoC) architecture. Our framework can be used with several objective functions; minimizing energy consumption, minimizing cost (i.e., the number of heterogeneous processors), and maximizing reliability of the system under performance constraints. We use Dynamic Voltage Scaling (DVS) for reducing energy consumption while we employ task duplication to maximize reliability. We illustrate the effectiveness of our approach through several experiments, each with a different number of tasks to be scheduled. We also propose two heuristics based on Earliest Deadline First (EDF) algorithm for minimizing energy under performance and cost constraints. Our experiments on generated task sets show that ILP-based method reduces the energy consumption up to 62% percent against a method that does not apply DVS. Heuristic methods obtain promising results when compared to optimal results generated by our ILP-based method.
Conference Paper
This paper introduces an alternative type-reduction method for interval type-2 (IT2) fuzzy logic systems (FLSs), with either continuous or discrete secondary membership function. Unlike the Karnik-Mendel type reducer which is based on the wavy-slice representation of a type-2 fuzzy set, the proposed type reduction algorithm is developed using the vertical-slice representation. One advantage of the approach is the output of the type reducer can be expressed in closed form, thereby providing a tool for the theoretical analysis of IT2 FLSs. The computational complexity of the proposed method is also lower than the uncertainty bounds method and the enhanced Karnik-Mendel method. To assess the feasibility of the proposed type-reducer, it is used to calculate the output of an IT2 fuzzy logic controller (FLCs). Results from a simulated coupled tank experiment demonstrated that IT2 FLCs that employ the proposed type reduction algorithm share similar robustness properties as FLCs based on the Karnik-Mendel type reducer.
Conference Paper
Dynamic thermal management (DTM) techniques to manage the load on a system to avoid thermal hazards are soon becoming mainstream in today's systems. With the increasing percentage of leakage power, switching off the processors is becoming a viable alternative technique to speed scaling. For real-time applications, it is crucial that under such techniques the system still meets the performance constraints. In this paper we study stop-go scheduling to minimize peak temperature when scheduling an application, modeled as a task-graph, within a given makespan constraint. For a given static-ordering of execution of the tasks, we derive the optimal schedule referred to as the JUST schedule. We prove that for periodic task-graphs, the optimal temperature is independent of the chosen static-ordering when following the proposed JUST schedule. Simulation experiments validate the theoretical results.