Conference Paper

Optimal task assignment in heterogeneous computing systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Distributed systems comprising networked heterogeneous workstations are now considered to be a viable choice for high-performance computing. For achieving a fast response time from such systems, an efficient assignment of the application tasks to the processors is imperative. The general assignment problem is known to be NP-hard, except in a few special cases with strict assumptions. While a large number of heuristic techniques have been suggested in the literature that can yield sub-optimal solutions in a reasonable amount of time, we aim to develop techniques for optimal solutions under relaxed assumptions. The basis of our research is a best-first search technique known as the A* algorithm from the area of artificial intelligence. The original search technique guarantees an optimal solution but is not feasible for problems of practically large sizes due to its high time and space complexity. We propose a number of algorithms based around the A* technique. The proposed algorithms also yield optimal solutions but are considerably faster. The first algorithm solves the assignment problem by using parallel processing. Parallelizing the assignment algorithm is a natural way to lower the time complexity, and we believe our algorithm to be novel in this regard. The second algorithm is based on a clustering based pre-processing technique that merges the high affinity tasks. Clustering reduces the problem size, which in turn reduces the state-space for the assignment algorithm. We also propose three heuristics which do not guarantee optimal solutions but provide near-optimal solutions and are considerably faster. By using our parallel formulation, the proposed clustering technique and the heuristics can also be parallelized to further improve their time complexity

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... An application running in a DS could be divided into a number of tasks and executed concurrently on different nodes in the system, referred to as the task allocation problem (TAP). To improve the performance of DS, several studies have been devoted to the TAP with the main concern on the performance measures such as minimizing the execution and communication cost123, minimizing the application turnaround time [4, 5], and achieving better fault tolerance [6, 7]. On the other hand, the real-time property is required in many DS (e.g., military systems). ...
... The time of processing a task at different processors is given in the range1516171819202122232425. The memory requirement of each task and node memory capacity is given in the range12345678910 and, respectively. The task processing load versus node processing capacity is given in the ranges and. ...
... The value of data to be communicated between tasks is given in the range5678910. The bandwidth and load capacity of communication links are given in the ranges1234 and. The range of task deadline value is. ...
Article
Full-text available
This paper addresses the problem of task allocation in real-time distributed systems with the goal of maximizing the system reliability, which has been shown to be NP-hard. We take account of the deadline constraint to formulate this problem and then propose an algorithm called chaotic adaptive simulated annealing (XASA) to solve the problem. Firstly, XASA begins with chaotic optimization which takes a chaotic walk in the solution space and generates several local minima; secondly XASA improves SA algorithm via several adaptive schemes and continues to search the optimal based on the results of chaotic optimization. The effectiveness of XASA is evaluated by comparing with traditional SA algorithm and improved SA algorithm. The results show that XASA can achieve a satisfactory performance of speedup without loss of solution quality.
... A good overview of the TAP can be found in references (Norman and Thanisch 1993; Ucar et al. 2006 ). The solution techniques can be roughly classified as: graph theoretical (Hui and Chanson 1997; Stone 1977; Lo 1988), mathematical programming (Chu et al. 1980), state-space search techniques (Kafil and Ahmad 1998; Kaya and Ucar 2009) , and a wide variety of metaheuristic approaches including simulated annealing (Hamam and Hindi 2000), mean field annealing (Bultan and Aykanat 1992), genetic algorithm (Chockalingam and Arunkumar 1992; Ahmad and Dhodhi 1995; Ahuja et al. 2000), tabu search (Chen and Lin 2000), particle swarm optimization (Salman et al. 2002; Yin et al. 2006) and hybrid ones mixing neural networks with genetic algorithm (Salcedo-Sanz et al. 2006) or mixing genetic algorithm with reinforcement learning algorithm (Yin et al. 2009), differential evolution algorithm (Zou et al. 2011), and improved version of Harmony Search (Zou et al. 2010). Because of the widespread use and success of meta-heuristic approaches in exploring the search space for combinatorial optimization problems efficiently and effectively within acceptable CPU times, we follow the meta-heuristic approach in this paper. ...
... If processor p is faster than processor q on task i, it does not imply anything about their speeds for another task. An example of TIG adopted from (Kafil and Ahmad 1998) is shown in Fig. 1. Given the TIG and the PIG, the problem can be formulated as a mapping A : V ! ...
... Future work would be applying HS to the other version of this problem (Pierson 2011; Kang et al. 2010). 59 0 1.00 1.02 1.01 1.00 1.00 1.00 1 1.PSO Particle swarm optimization (Salman et al. 2002) modified to handle heterogeneous processors KLZ Kopidakis et al.'s MaxEdge algorithm which is the best of the two heuristics proposed in (Kopidakis et al. 1997) VML Lo's task assignment algorithm (Lo 1988) TOpt Bokhari's shortest tree algorithm (Bokhari 1981) A* A state space search algorithm based on A* algorithms given in (Kafil and Ahmad 1998) Evolving Systems (2013) 4:153–169 167 123 ...
Article
Full-text available
This paper presents an improved version of a music-inspired meta-heuristic algorithm, Harmony Search (HS), for successfully solving the NP-complete task assignment problem (TAP) in distributed computing systems. Task assignment is an important and core step in distributed systems where program tasks must be properly allocated to the processors to effectively harness the computing power by better exploitation of the system parallelism and improving system performance. The proposed HS-based algorithm explores the search space effectively and efficiently by exploiting the factors of randomness, experience, and variation of experience. Our main contributions in this work are: introducing a modification in the pitch adjustment operator of the Harmony Search, mapping Harmony Search solutions to the clustering methodology which shows its superiority in solving TAP, and using a local refinement heuristic to improve a given assignment. The effectiveness of the proposed HS-based algorithm is demonstrated by comparing it with a recently reported Harmony Search algorithm, NGHS, and with a wide variety of other earlier reported meta-heuristic techniques such as GA, SA, PSO and many others, to solve the TAP. Simulation results indicate that the proposed HS-based algorithm is a viable approach for the TAP, which could find better quality (or even optimal) solutions within reasonable computation time.
... The A* Algorithm Scheduling algorithms based on the A* search technique from the field of artificial intelligence is proposed in [3]. The A* algorithm is used to search efficiently in the search space, in this case, a tree. ...
... Taxonomy of Task Assignment Algorithms[3] ...
... 3 illustrates an application program that has been 'atomized' into ten subtasks.The amount of computation required for each subtask is represented within parentheses.The number of clock cycles required to completely execute the subtask on a baseline machine may be used as a yard stick to quantify the computational requirement of a subtask. cycles on the same baseline machine. ...
Article
Heterogeneous Computing (HC) systems achieve high performance by networking together computing resources of diverse nature. The issues of task assignment and scheduling are critical in the design and performance of such systems. In this thesis, an auction based game theoretic framework is developed for dynamic task scheduling in HC systems. Based on the proposed game theoretic model, a new dynamic scheduling algorithm is developed that uses auction based strategies. The dynamic scheduling algorithm yields schedules with shorter completion times than static schedulers while incurring higher scheduling overhead. Thus, a second scheduling algorithm is proposed which uses an initial schedule generated with a learning automaton based algorithm, and then heuristics are used to identify windows of tasks within the application that can be rescheduled dynamically during run time.
... The method further was extended by reducing the number of nodes, which happened by an initial guess. The results for 20 tasks and four nodes have been calculated in very short runtimes (Kafil and Ahmad 1997). ...
... Most of the other task assignment problems in IMA architecture (e.g. Lo 1988;Kafil and Ahmad 1997;Salomon and Reichel 2013;Dougherty et al. 2011) are combined with time constraints for scheduling and network transmission. They all used heuristic methods like evolutionary algorithms (EV) including GA, NSGA-II, PSO, and ACO for these discrete optimisation problems. ...
Chapter
The systems described can be combined to form architectures to suit a particular aircraft role. At the early stages of project design, it is most likely that a number of different solutions will emerge. The task for assessing these solutions and arriving at an optimal solution can be time‐consuming, and some form of automation will ease the task. This chapter looks at some methods currently being developed and proposes an example methodology.
... Let ETC be a T × M matrix where ETC ij is the estimated time to compute a task of type i on a machine of type j. The ETC matrix is frequently used in scheduling algorithms (e.g., [10,15,7,8,16]). ETC is generally obtained from historical data in real environments. ...
... Another related algorithm is presented in [2] that approximates makespan to provide computationally efficient schedules while considering reliability for DVFS scheduling on identical processors. In [16] , the A * search algorithm is used to assign tasks to machines considering task dependencies and communication constraints . This algorithm is very expensive for large numbers of tasks because the algorithm's branching factor is on the order of the number of machines and the depth is on the order of the number of tasks. ...
... Similarly, let APC be a T ⇥ M matrix where APC ij is the average power consumption for executing a task of type i on a machine of type j. These matrices are frequently used in scheduling algorithms (e.g., [5], [8]–[12]). ETC and APC are generally obtained from historical data in real environments. The lower bound on the finishing time of the machines of a machine type is found by allowing tasks assigned to a machine type to be divided among all machines to ensure the minimal finishing time. ...
... An algorithm is presented in [34] that minimizes energy while constraining makespan and reliability to provide computationally efficient schedules for DVFS scheduling on identical processors. In [12], the A ⇤ search algorithm is used to assign tasks to machines considering task dependencies and communication constraints. This algorithm is very expensive for large numbers of tasks because the algorithm's branching factor is on the order of the number of machines and the depth is on the order of the number of tasks. ...
Article
Full-text available
Resource management for large-scale high performance computing systems pose difficult challenges to system administrators. The extreme scale of these modern systems require task scheduling algorithms that are capable of handling at least millions of tasks and thousands of machines. These large computing systems consume vast amounts of electricity leading to high operating costs. System administrators try to simultaneously reduce operating costs and offer state-of-the-art performance; however, these are often conflicting objectives. Highly scalable algorithms are necessary to schedule tasks efficiently and to help system administrators gain insight into energy/performance trade-offs of the system. System administrators can examine this trade-off space to quantify how much a difference in the performance level will cost in electricity, or analyze how much performance can be expected within an energy budget. In this study, we design a novel linear programming based resource allocation algorithm for a heterogeneous computing system to efficiently compute high quality solutions for simultaneously minimizing energy and makespan. These solutions are used to bound the Pareto front to easily trade-off energy and performance. The new algorithms are highly scalable in both solution quality and computation time compared to existing algorithms, especially as the problem size increases.
... Based on the different objectives, the allocation problem can be classified into three broad categories, namely, performance oriented allocation, reliability oriented allocation and schedulability oriented allocation . The performance oriented allocation problem arises when the objective is to optimize a performance oriented cost function, like sum of execution and communication costs [4, 5, 11, 12], turnaround time [13]–[15] , interprocessors communication costs [7] and load balancing [9]. The reliability oriented allocation problem arises when the objective is to maximize the degree of system reliability [16]–[18]. ...
... Some approaches have been developed to this problem considering the total system cost as the objective function to be minimized [4, 5, 12] . Other approaches have been developed considering the completion time as the objective function to be minimized [13, 14, 15] . However, few approaches take into account the application requirements, such as memory and computation load requirements, and the resources availability such as available memory capacity and available computation load. ...
Conference Paper
Full-text available
The rapid progress of microprocessors and computer net-work technologies has made distributed systems econom-ically attractive for many computer applications than the very expensive massively parallel machines. To realize per-formance potential, tasks of a parallel application must be assigned carefully to the available processors of the system. If this step is not done properly, an increase in the num-ber of processors may actually result in a decrease of total throughput. This degradation is caused by what is com-monly called the 'saturation effect' which occurs due to heavy communication traffic incurred by data transfer be-tween tasks that reside on separate processors. This paper first formulates the allocation problem as an optimization problem and then presents a heuristic algorithm derived from Simulated Annealing (SA) to solve this problem in a reasonable amount of computation time. The effectiveness of the algorithm is evaluated by comparing the quality of solutions with those derived using Branch-and-Bound on the same sample problems.
... 12 18 15 t 2 9 18 8 ECM(,) = t 3 19 7 15 t 4 5 11 10 t 5 12 17 5 t 6 10 7 11 t 7 17 2 12 ...
... Step 5.3: Remove the cluster that consists of either t 3 or t 7 from ACL(,) and corresponding ECs from SUMNEW(,). Now, the ACL(,) and SUMNEW(,) are: Remove the clusters that either contains task t 2 or t 5 from ACL(,) and their corresponding ECs from SUMNEW(,). ...
... Kim and O. Ibarra [2],  A selection strategy by Kcafilet [3], simulation model deliberation by Elperin and R. T. Eliasi [4]; fastest pace with first approximated time (HLFET) by K. Chandy as well as T. Adam [5]; intubation planning algorithmic (ISH) by Kruatrachue [6,7] and evolutionary R. BANGROO, K. GUPTA and A. K. DAHIYA Advances and Applications in Mathematical Sciences, Volume 20, Issue 10, August 2021 2046 algorithms by E. Hou and N. Ansari [8]. ...
Article
The dynamic NP-Hard mixture problem formulation in the domain of simultaneous processing is the challenge of multiprocessor programming. Numerous heuristics and metaheuristics have been widely used by researchers to tackle this issue in an optimal way. Even so, the proposed method is among the many commonly used methods to solve such problems. But genetic algorithm has known to show certain drawbacks which limits its use in such problems. This paper therefore presents a modified rendition of the evolutionary algorithm, i.e. the Quantum Genetic Algorithm, to address the problem of multi-processor route optimization. The results collected were also contrasted with the other meta-heuristics to show their validity.
... This leads to the scheduling of more than one cluster onto the same processor and unavoidably increases the overall schedule length. Approaches, that are used to minimize the execution as well as communication costs, are discussed in [6,12,20,22,23]. ...
Article
Full-text available
With the recent enhancements in massive parallel processing technologies, the problem of scheduling tasks in multiprocessor systems is becoming progressively important. The problem of scheduling task graph in parallel and distributed computing system is a well defined NP-complete problem. The problem of task assignment, in heterogeneous computing systems has been premeditated for many years with many distinctions. In a Distributed Computing Systems (DCSs), a program is split into small tasks and distributed among several computing elements to minimize the overall system cost. In real-life situation when number of tasks is much larger than number of processors, a task clustering approach is very useful for allocating tasks on the basis of some predetermined criteria. This paper deals with the problem of task allocation in a DCS such that the load on each processing node is almost balanced. The present work aims at the development of an effective algorithm for allocating 'm' tasks to 'n' processors in a given distributed system using task clustering. The model includes inter-task communication cost (ITCC) along with the execution cost (EC). A new concept of Load Balancing Factor (LBF) is introduced to judge the degree of load balancing. Value one, of LBF, signifies a perfect load balancing.
... Relaxed assumptions compared to the real physical problem are another issue by, for example, expressing the objective function for minimizing load on the heaviest loaded processor among all assignments in heterogeneous platform without any uniqueness or capacity constraints. 16 Although these works seek to optimize an assignment problem for heterogeneous platforms and the proposed strategies seem to be effective, the problem is not properly handled and formalized. This results in non realistic solutions or solutions that do not fit the real problem. ...
... This does not guarantee that each application task is assigned to a resource. Furthermore, in [17] the authors only express the objective function for minimizing load on the heaviest loaded processor among all assignments in heterogeneous platform without any uniqueness or capacity constraints. This is a very relaxed assumption compared to real problems. ...
... In all of these methods, the basic idea is to determine an order for tasks based on their execution priority. The approach of each method is different with regards to various aspects of input DAG (see e. g, [3], [4], [5], [6]). Each heuristic approach comprises two parts: firstly, finding an optimal order of tasks and secondly, allocating an appropriate processor to tasks. ...
Article
Full-text available
In a heterogeneous multiprocessor system, a large program is decomposed into a set of tasks that have data dependencies. The most important problem, which is encountered in such systems, is task matching and scheduling. It consists of assigning tasks to processors, ordering task execution for each processor, and ordering interprocessor data transfers. The goal is to schedule all the tasks on the available processors so as to minimize the overall length of time required to execute the entire program without violating precedence constraints. This efficient scheduling reduces processing time and increases processor utilization, i.e. achieves high performance. This paper studies a previously developed problem-space genetic algorithm (PSGA)-based technique for task matching and scheduling on a heterogeneous multiprocessor system, and modifies it to mitigate its drawbacks. The modified algorithm is called MPSGA. Then, the paper presents a proposed Simulated Annealing (SA)-based task scheduling algorithm. Finally, it presents a proposed hybrid task scheduling algorithm that combines the proposed SA with MPSGA, which is called MPSGA_SA. Experiments have been conducted to evaluate the performance of the proposed scheduling techniques.
... Chu and Lan have introduced the workload of the bottleneck computer as the criterion for the evaluation of an allocation quality [4]. A computer with the heaviest task load is the bottleneck machine in the system, and its workload is a critical value that should be minimized [12]. The workload Z i + (x) of a computer allotted to the ith node for the allocation x is provided by the subsequent formula: ...
Chapter
Full-text available
In this chapter, a genetic programming paradigm is implemented for reliability optimization in the Comcute grid system design. Chromosomes are generated as the program functions and then genetic operators are applied for finding Pareto-suboptimal task assignment and scheduling. Results are compared with outcomes obtained by an adaptive evolutionary algorithm.
... Because of its fundamental significance, extensive 1 study has been made in the field and bunch of algorithms exist in the literature. The classification of the algorithms has been done as: list scheduling algorithms [7, 8, 12, 14], guided random algorithms [5], cluster based [9] and task duplication algorithms [2, 3, 6]. List scheduling algorithms have been chosen as a research area for the work proposed in the paper. ...
Article
Full-text available
Today's multi-computer systems are heterogeneous in nature, i.e., the machines they are composed of, have varying processing capabilities and are interconnected through high speed networks, thus, making them suitable for performing diverse set of computing-intensive applications. In order to exploit the high performance of such a distributed system, efficient mapping of the tasks on available machines is necessary. This is an active research topic and different strategies have been adopted in literature for the mapping problem. A novel approach has been introduced in the paper for the efficient mapping of the DAG-based applications. The approach that takes into account the lower and upper bounds for the start time of the tasks. The algorithm is based on list scheduling approach and has been compared with the well known list scheduling algorithms existing in the literature. The comparison results for the randomly synthesized graphs as well as the graphs from the real world elucidate that the proposed algorithm significantly outperforms the existing ones on the basis of different cost and performance metrics.
... Several approaches, in both fields of computer science and operations research, have been suggested to solve the assignment problem. They may be roughly classified into four categories, namely, graph theoretical [15, 19, 11], mathematical programming [12, 5, 2, 7, 14, 9, 3, 4], state space search [16, 20, 10, 18, 13], and heuristics [6, 1, 8]. The graph theoretical method uses a graph to represent the problem and then applies the maximum flow minimum cut algorithms on the graph to find an optimal assignment. ...
Chapter
Full-text available
Distributed computing systems have become more attractive and very important in recent years due to the advances of microprocessors and computer networks technologies. They not only provide the facility for utilizing remote resources and/or data that are not available on a local computer but also increase the throughput by providing facilities for parallel processing. One of the major problems that arises with such systems is that of allocating tasks of an application over computers of a distributed computing system. This problem is known to be NP-hard in most cases. Each task incurs execution costs that may be different for each processor assignment, and communicating tasks that are not assigned to the same processor incur communication costs. In searching for an assignment, tasks tend to be assigned to processors on which they have low execution costs, while communicating tasks tend to be assigned to the same processor to minimize communication costs. This paper first investigates the problem of task assignment in distributed computing systems, formulates the assignment problem as an optimization problem and then proposes an exact algorithm to find a solution that minimizes the total execution and communication costs. Some experimental results are tabulated to show the effectiveness of the proposed algorithm.
... Several approaches, in both fields of computer science and operations research, have been suggested to solve the assignment problem. They may be roughly classified into four categories, namely, graph theoretical [15, 19, 11], mathematical programming [12, 5, 2, 7, 14, 9, 3, 4], state space search [16, 20, 10, 18, 13], and heuristics [6, 1, 8]. The graph theoretical method uses a graph to represent the problem and then applies the maximum flow minimum cut algorithms on the graph to find an optimal assignment. ...
Conference Paper
Full-text available
Distributed computing systems have become more attractive and very important in recent years due to the advances of microprocessors and computer networks technologies. They not only provide the facility for utilizing remote resources and/or data that are not available on a local computer but also increase the throughput by providing facilities for parallel processing. One of the major problems that arises with such systems is that of allocating tasks of an application over computers of a distributed computing system. This problem is known to be NP-hard in most cases. Each task incurs execution costs that may be different for each processor assignment, and communicating tasks that are not assigned to the same processor incur communication costs. In searching for an assignment, tasks tend to be assigned to processors on which they have low execution costs, while communicating tasks tend to be assigned to the same processor to minimize communication costs. This paper first investigates the problem of task assignment in distributed computing systems, formulates the assignment problem as an optimization problem and then proposes an exact algorithm to find a solution that minimizes the total execution and communication costs. Some experimental results are tabulated to show the effectiveness of the proposed algorithm.
... They may be roughly classified into two categories, namely, exact algorithms and heuristics. Exact algorithms may be developed using different strategies such as graph theory [9], state space search [10, 16] and mathematical programming [12, 4, 3] . However , these approaches are limited by the amount of time needed to obtain an optimal solution since they grow as exponential functions of the problem size. ...
... Another heuristic by Hui & Chanson [41] does not take precedence among tasks into account while generating a schedule. The A* based technique [44], from the area of artificial intelligence is not practical for large DAGs. Between two recent algorithms proposed " Heterogeneous Earliest Finish Time " (HFET) [42] and " Critical Path On a Processor " (CPOP) [42], HFET is said to perform better. ...
... They may be roughly classified into two main categories, namely, exact algorithms and heuristic methods. The exact algorithms may be developed using different strategies such as graph theory [1], state space search234 and mathematical programming567. However, the exact algorithms are limited by the time required to obtain an optimal solution, where the time grows exponentially with the problem size. ...
Article
Full-text available
This paper addresses the problem of static load balancing in heterogeneous distributed computing systems taking into account both memory and communication capacity constraints. The load balancing problem is first modeled as an optimization problem. Then, a heuristic approach, called Adaptive Genetic Algorithm (AGA), is proposed to solve the problem. The performance of the proposed algorithm is evaluated by simulation studies on randomly generated instances and the results are compared with that obtained by applying both the Genetic Algorithm (GA) and the Simulated Annealing (SA). Also, the qualities of the results are compared with the optimal solutions that obtained by applying the Brach-and-Bound (BB) algorithm.
... There are numerous studies addressing the task assignment problem under various characterizations. For some later works on mapping TIGs to processors in order to minimize turnaround time see [14]for exact algorithms under processor heterogeneity and network homogeneity; [15] for exact algorithms under processor and network heterogeneity and [13] for heuristics that map TIGs to processors in order to minimize total communication time in a heterogeneous network. The task assignment problem can be modeled as the problem of partitioning the nodes of a graph into nsubsets so as to minimize the cost of the n-cut (i.e., the communication cost) and balance the subset size (i.e., load balancing). ...
Article
We consider the problem of assigning tasks to homogeneous nodes in the distributed system, so as to minimize the amount of communication, while balancing the processors' loads. This issue can be posed as the graph partitioning problem. Given an undirected graph G=(nodes, edges), where nodes represent task modules and edges represent communication, the goal is to divide n, the number of processors, as to balance the processors' loads, while minimizing the capacity of edges cut. Since these two optimization criteria conflict each other, one has to make a compromise between them according to the given task type. We propose a new cost function to evaluate static task assignments and a heuristic algorithm to solve the transformed problem, explicitly describing the tradeoff between the two goals. Simulation results show that our approach outperforms an existing representative approach for a variety of task and processing systems.
... Much research efforts on the task allocation problem have been identified in the past with the main concern on the performance measures such as minimizing the total sum of execution and communication costs [1][2][3][4]6,7,11 ] or minimizing the program turnaround time [8,10,22], the maximization of the system reliability [12][13][14][15][16][17][18][19] and safety [16]. ...
Article
Full-text available
In Distributed computing systems (DCSs), task allocation strategy is an essential phase to minimize the system cost (i.e. the sum of execution and communication costs). To utilize the capabilities of distributed computing system (DCS) for an effective parallelism, the tasks of a parallel program must be properly allocated to the available processors in the system. Inherently, task allocation problem is NP-hard in complexity. To overcome this problem, it is necessary to introduce heuristics for generating near optimal solution to the given problem. This paper deals with the problem of task allocation in DCS such that the system cost is minimized. This can be done by minimizing the inter-processor communication cost (IPCC). Therefore, in this paper we have proposed an algorithm that tries to allocate the tasks to the processors, one by one on the basis of communication link sum (CLS). This type of allocation policy will reduce the inter-processor communication (IPC) and thus minimize the system cost. For an allocation purposes, execution cost of the tasks on each processor and communication cost between the tasks has been taken in the form of matrices.
... They may be roughly classified into two categories, namely, exact algorithms and heuristics. Exact algorithms may be developed using different strategies such as graph theory [9], state space search [10, 16] and mathematical programming [12, 4, 3] . However , these approaches are limited by the amount of time needed to obtain an optimal solution since they grow as exponential functions of the problem size. ...
Data
Full-text available
A fundamental issue affecting the performance of a parallel application running on a distributed system is the distribution of the workload over the various machines in the system. This problem is known to be NP-hard in most cases and therefore untractable as soon as the number of tasks and/or computers exceeds a few units. This paper first presents a mathematical model for load balancing problem. It then proposes an optimal, memory efficient, two phase algorithm for allocating program modules (tasks) onto processors of a heterogeneous distributed system to minimize the makespan (i.e., the completion time at the maximum loaded processor). The algorithm first finds a near optimal allocation by applying Simulated Annealing (SA) and then finds an optimal distribution by applying Branch-and- Bound (BB) technique considering the solution of SA as the initial solution. The proposed algorithm overcomes the low solutions quality that may be obtained by using heuristics. It also overcomes the computational time complexity of the exact algorithms. Some experimental results are given to show the effectiveness of the proposed algorithm.
... If the number of computers is greater than 3 or the memory in a computer is limited, then a problem of the program completion cost minimization by task assignment is NP-hard. The workload of the bottleneck computer is another fundamental criterion for the evaluation of an allocation quality [11]. A computer with the heaviest task load is the bottleneck machine, and its workload is a critical value that is supposed to be minimized [6]. ...
Conference Paper
Full-text available
Reliability and the load balancing are crucial factors for a quality evaluation of distributed systems. Load balancing of the Web servers can be implemented by reduction of the workload of the bottleneck computer what improves both a performance of the system and the safety of the bottleneck computers. An evolutionary algorithm based on a tabu search procedure is discussed for multi-criteria optimization of distributed systems A tabu mutation is applied for minimization the workload of the bottleneck computer. It can be obtained by task assignment as well as selection of suitable computer sorts. Moreover, a negative selection procedure is developed for improving nonadmissible solutions. Some numerical results are submitted.
... Entries in the ETC matrix represent the estimated amount of time a task type takes to execute on a given machine type. Research in resource allocation often assumes the availability of ETC information (e.g., [19, 20, 21, 22]). We have provided the analysis framework for system administrators to use ETC information from data collected on their specific systems. ...
Conference Paper
Full-text available
The energy consumption of data centers has been increasing rapidly over the past decade. In some cases, data centers may be physically limited by the amount of power available for consumption. Both the rising cost and physical limitations of available power are increasing the need for energy efficient computing. Data centers must be able to lower their energy consumption while maintaining a high level of performance. Minimizing energy consumption while maximizing performance can be modeled as a bi-objective optimization problem. In this paper, we develop a method to create different resource allocations that illustrate the trade-offs between minimizing energy consumed and minimizing the makespan of a system. By adapting a popular multi-objective genetic algorithm we are able to construct Pareto fronts (via simulation) consisting of Pareto-efficient resource allocations. We analyze different solutions from within the fronts to further understand the relationships between energy consumption and makespan. This information can allow system managers to make intelligent scheduling decisions based on the energy and performance needs of their system.
... In resource allocation, it is common to assume the availability of such characteristics (e.g. [17], [18], [19], [20], [21]). These values may be taken from historical sources ([20], [19]) or may be constructed synthetically for simulation purposes ([22], [15]). ...
Conference Paper
Full-text available
As high performance computing systems contin-ually become faster, the operating cost to run these systems has increased. A significant portion of the operating costs can be attributed to the amount of energy required for these systems to operate. To reduce these costs it is important for system administrators to operate these systems in an energy-efficient manner. To help facilitate a transition to energy-efficient computing, the trade-offs between system perfor-mance and system energy consumption must be analyzed and understood. We analyze these trade-offs through bi-objective resource allocation techniques, and in this paper we explore an analysis approach to help system administrators inves-tigate these trade-offs. Additionally, we show how system administrators can perform "what-if" analyses to evaluate the effects of adding or removing machines from their high performance computing systems. We perform our study using three environments based on data collected from real machines and real applications. We show that by utilizing different resource allocations we are able to significantly change the performance and energy consumption of a given system, providing a system administrator with the means to examine these trade-offs to help make intelligent decisions regarding the scheduling and composition of their systems.
... However, unlike our work, they assume full control over the message routing. In [13], the authors present algorithms based on the best-first A* technique from artificial intelligence for optimal task placement on heterogeneous systems. The placement constraint is specified as a placement cost metric for mapping a task to a particular node. ...
Article
Data-driven macroprogramming of wireless sensor networks (WSNs) provides an easy to use high-level task graph representation to the application developer. However, determining an energy-efficient initial placement of these tasks onto the nodes of the target network poses a set of interesting problems. We present a framework to model this task-mapping problem arising in WSN macroprogramming. Our model can capture placement constraints in tasks, as well as multiple possible routes in the target network. Using our framework, we provide mathematical formulations for the task-mapping problem for two different metrics-energy balance and total energy spent. For both metrics, we address scenarios where (1) a single or (2) multiple paths are possible between nodes. Due to the complex nature of the problems, these formulations are not linear. We provide linearization heuristics for the same, resulting in mixed-integer programming (MIP) formulations. We also provide efficient heuristics for the above. Our experiments show that our heuristics give the same results as the MIP for real-world sensor network macroprograms, and show a speedup of up to several orders of magnitude. We also provide worst-case performance bounds of the heuristics.
... Traditionally, distributed systems research has considered the cause of load inbalances to be an inheriently useroriented problem [3]. The entire cluster of machines is often considered to be a common computational resource shared among a number of users each with their own and often competing objectives [2]. ...
Article
Full-text available
1. Motivation As multi-agent researchers seek to scale their simulations of complex systems towards more realistic levels of agents they are posed with the challenge of balancing the integrity of the system's underlying model against their ability to har-ness the extra computational power that comes from utiliz-ing multiple interconnected machines. An increased num-ber of agents generates larger workloads on each machine and ensuring that the load of each reflects the resources available to the machine at the time becomes integral in minimizing the duration of the simulation. While a wealth of literature exists in the field of distributed computing that points to various techniques for balancing a load across multiple machines much of this knowledge fails to directly translate to a solution for multi-agent simulations for the following reasons: • Multi-agent systems utilize communication extensively. Unlike the traditional distributed computing model where each machine carries out its workload indepen-dently until the simulation terminates, at which point results from each are aggregated to form a final re-sult; the distributed multi-agent model requires inter-mediate communication between machines. In addi-tion to messages sent between agents, machines may also communicate amongst each other about a shared data source. The cost communication should play a crucial role in determining the load of an individual machine; factoring in message length, network latency as well as the frequency of the number of messages sent/recieved.
... The estimated time to complete (ETC) each task, on each machine, is given. This assumption is usually made in the literature [19], [20], [21]. An ETC example is: ...
Conference Paper
Full-text available
This paper investigates the automatic parallelization of a heuristic for an NP-complete problem, with machine learn-ing. The objective is to automatically design a new concurrent algorithm that finds solutions of comparable quality to the original heuristic. Our approach, called Savant, is inspired from the Savant syndrome. Its concurrency model is based on map-reduce. The approach is evaluated with the well-known Min-Min heuristic. Simulation results on two problem sizes are promising, the produced algorithm is able to find solutions of comparable quality.
... The spread of high-speed broadband networks in developed countries, the continual increase in computing power, and the growth of the Internet have changed the way in which society manages information and information services [1][2]. Geographically distributed resources, such as storage devices, data sources, and supercomputers, are interconnected and can be exploited by users around the world as single, unified resource. ...
Article
A distributed computing system is the system architecture that makes a collection of heterogeneous computers, workstations, or servers act and behave as a single computing system. In such a computing environment, users can uniformly access and name local or remote resources, and run processes from anywhere in the system, without being aware of which computers their processes are running on. Distributed computing systems have been studied extensively by researchers, and a great many claims and benefits have been made for using such systems. In fact, it is hard to rule out any desirable feature of a computing system that has not been claimed to be offered by a distributed system. However, the current advances in processing and networking technology and software tools make it feasible to achieve the diverse advantages like increased performance, Cost-effectiveness, Sharing of resources, increased extendibility. To meet such challenging computing requirements reliability plays an important role. In this paper authors are proposing a model for enhancing processors throughput in distributed computing system.
... The number of instructions of a BoT application is the summation of all the instructions of its tasks. The number of instructions that are used to estimate the time to compute tasks in computing resources is commonly assumed to be known beforehand (Sugavanam et al. 2007; Kafil and Ahmad 1998; Yu et al. 2005; Buyya et al. 2005). In addition, techniques to estimate task completion time can be found in (Ghafoor and Yang 1993; Smith et al. 1998; Jarvis et al. 2006). ...
Article
Executing bag-of-tasks applications in multiple Cloud environments while satisfying both consumers’ budgets and deadlines poses the following challenges: How many resources and how many hours should be allocated? What types of resources are required? How to coordinate the distributed execution of bag-of-tasks applications in resources composed from multiple Cloud providers?. This work proposes a genetic algorithm for estimating suboptimal sets of resources and an agent-based approach for executing bag-of-tasks applications simultaneously constrained by budgets and deadlines. Agents (endowed with distributed algorithms) compose resources and coordinate the execution of bag-of-tasks applications. Empirical results demonstrate that the genetic algorithm can autonomously estimate sets of resources to execute budget-constrained and deadline-constrained bag-of-tasks applications composed of more economical (but slower) resources in the presence of loose deadlines, and more powerful (but more expensive) resources in the presence of large budgets. Furthermore, agents can efficiently and successfully execute randomly generated bag-of-tasks applications in multi-Cloud environments.
... Pour cela nous proposons d'utiliser une heuristique du type A * . Les heuristiques A * ont été utilisées avec succès pour calculer des ordonnancements particulièrement dans les cas de compilation vers une architecture [12] ou un cluster [17] très hétérogène. ...
Article
Abstract This document,report the work that I realize during my,Master 2 internship in the project-team CACAO at LORIA under the supervision,of Jérémie Detrey and the co-supervision of Jean-Luc Beuchat (LCIS, Japan). In order to compute cryptographic pairings eciently on some dedicated circuits, we design a family of specific coprocessors and an automatic tools for programming them. Then we have explored a large range of trade-o,in the choice of algorithms and the architecture of the coprocessor. Thanks,to a finely tuned,co-design we present good,performances,in pairing computation. Keywords : list scheduling, automatic parallelization, finite field arithmetic,elliptic curves, pairing based
... In [54], the authors present algorithms based on the best-first A* technique from artificial intelligence for optimal task placement on heterogeneous systems. The placement constraint is specified as a placement cost metric for mapping a task to a particular node. ...
Article
Acknowledgments My first feeling of gratitude is for my advisor, Prof. Viktor K. Prasanna, whose guid- ance over the past several years further enriched this life-changing experience. What I have learned from him will guide my actions throughout my research career. This work would not have been possible without him. I would also like to express my gratitude to my committee members, Prof. Bhaskar Krishnamachari and Prof. Gaurav Sukhatme for their useful insights during this process and for their deep questions that helped in strengthening this dissertation. Through the course of my Ph.D. I also received excellent guidance and help from a set of wonderful researchers, viz the P-Group. Special thanks are due to Sumit Mo- hanty, Mitali Singh, Zack Baker, Ron Scrofano, Ling Zhuo, Cong Zhang, Amol Bak- shi, Yinglong Xia, Hoang Le, and Qunzhi Zhou with whom I had the opportunity to interact closely. I would also like to take this opportunity to express my,gratitude for the excellent administrators — Henryk Chrostek, Rosine Sarafian, Aimee Barnard, Estela Lopez, and Janice Thompson — who were always there to support me through this journey. Thanks also to Luca Mottola and Gian-Pietro Picco, collaborating with whom,gave my research a fresh perspective. Apart from research skills, I also learned the art of being a good teacher at USC. I would like to thanks Prof. Gandhi Puvvada, who was my mentor for three semesters, and Diane Demetras, who was always there so solve any teaching-related (and other)
Article
Full-text available
The tasks scheduling issue on parallel processors in real-time system is part of the NP-hard issues. This manuscript developed a model for solving the tasks scheduling issue in heterogeneous multiprocessor environment. In this model, we developed two hybrid genetic algorithms; first algorithm named as HHCGA is the union of hierarchical clustering and genetic algorithm, used for making the tasks clusters of to decrease inter-tasks communication cost; second algorithm named as HHAGA is the union of heuristic approach and genetic algorithm, used for scheduling the tasks clusters onto processors to decrease system cost. The developed model has multiple objectives such as minimize the response time, system cost and maximize system reliability simultaneously. The efficacy of the developed model has been shown via simulation studies. The results of the developed model are likened than that of some other models in simulation studies. This model is appropriate for both types fuzzy cost or crisp cost and it also worked very well for random number of processors and tasks.
Conference Paper
This paper solves the problem of maximizing reliability of heterogeneous distributed computing system where random node can fail permanently. The reliability of the system can be achieved by executing all the tasks queued on its node before they all fail. This paper presents a framework to characterize the service reliability of Distributed Computing System (DCS). Reliability is characterized in the presence of communication uncertainties and topological changes due to nodes deletion. Because the DCS is heterogeneous, so its various nodes have different hardware and software characteristics. The different components of the application also have various hardware and software requirements. These applications will provide their desired functionality when their requirements are satisfied. For improving the reliability of the DCS one way is the proper allocation of tasks among the nodes. Firstly, we determine the candidate nodes for tasks that can satisfy to its requirements. Then we utilize the load sharing policies for handling the nodes failure as well as maximizing the service reliability of DCS.
Chapter
Multiprocessor task scheduling problem is a well-known NP-hard and an important problem in the field of parallel computing. In order to solve this problem optimally, researchers have applied various heuristics and meta-heuristics. However, Genetic Algorithm (GA) is one of the widely opted meta-heuristic approaches to solve combinatorial optimization problems. In order to increase the probability of finding an optimal solution in GA, a new approach known as Quantum Genetic Algorithm (QGA) has been adopted. QGA increases the speed and efficiency of computation of a conventional GA by introducing the concept of parallelism of quantum computing in GA. In this paper, Quantum behavior inspired GA is introduced to solve multiprocessor task scheduling problem. The proposed QGA has been modified at certain points with some new operators to make it compatible for the same problem. The performance of proposed QGA is verified on a standard problem of linear algebra i.e., Gauss Jordan Elimination (GJE). The results have been compared with the state of the arts to prove its effectiveness.
Thesis
Full-text available
This research study has produced advances in the understanding of communities within complex network. community in this context is defined as subgraph with higher internal density and lower crossing density with respect to other subgraphs. In this study, novel and efficient distance-based ranking algorithm called the Correlation Density Rank (CDR) has been proposed and is utilized for broad range of applications, such as deriving the community structure and the evolution graph of the organizational structure from dynamic social network, extracting common members between overlapped communities, performance-based comparison between different service providers in wireless network, and finding optimal reliability-oriented assignment tasks to processors in heterogeneous distributed computing systems. The experiments, conducted on both synthetic and real datasets, demonstrate the feasibility and applicability of the framework.
Chapter
he scheduling problem is an optimization problem with fundamental principles applicable in several fields [139], e.g. computer science, economics, job scheduling, project management, production etc. Both research and development community have already invested decades of research and experiments to the techniques solving several instances of the scheduling problem in computer science. Important advances have appeared to the scheduling problem as it applies to design time mapping on multiprocessing platforms emphasizing on ordering in time and assignment in place.
Chapter
The scheduling and assignment techniques heavily affect the system design and performance, as they are responsible for meeting the system specifications, e.g. real-time behavior, minimal energy consumption, reliability etc. The scheduling technique assigns operations, groups of operations, memory references or communication transactions to control steps and hardware resources (homogeneous or heterogeneous processing elements, function units, memories or intercommunication networks) to these operations.
Book
This book describes scalable and near-optimal, processor-level design space exploration (DSE) methodologies. The authors present design methodologies for data storage and processing in real-time, cost-sensitive data-dominated embedded systems. Readers will be enabled to reduce time-to-market, while satisfying system requirements for performance, area, and energy consumption, thereby minimizing the overall cost of the final design. © Springer International Publishing Switzerland 2014. All rights are reserved.
Article
This work presents a novel hybrid meta-heuristic that combines particle swarm optimization and genetic algorithm (PSO-GA) for the job/tasks in the form of directed acyclic graph (DAG) exhibiting inter-task communication. The proposed meta-heuristic starts with PSO and enters into GA when local best result from PSO is obtained. Thus, the proposed PSO-GA meta-heuristic is different than other such hybrid meta-heuristics as it aims at improving the solution obtained by PSO using GA. In the proposed meta-heuristic, PSO is used to provide diversification while GA is used to provide intensification. The PSO-GA is tested for task scheduling on two standard well-known linear algebra problems: LU decomposition and Gauss-Jordan elimination. It is also compared with other states-of-the-art heuristics for known solutions. Furthermore, its effectiveness is evaluated on few large sizes of random task graphs. Comparative study of the proposed PSO-GA with other heuristics depicts that the PSO-GA performs quite effectively for multiprocessor DAG scheduling problem.
Conference Paper
This paper exposes the mismatch between the classic problem representation in the scheduling problem of independent task mapping and the reality of multi-core processors, operating system driven power management and time sharing for overlapping I/O with computation. A new, simple, model is proposed to address this gap. The model, along with a scheduling heuristic, are applied to the evaluation of software pipelining in the context of the recent millicomputing initiative.
Article
Full-text available
This paper addresses the problem of static load balancing in heterogeneous distributed computing systems taking into account both memory and communication capacity constraints. It first models the load balancing problem as an optimization problem. It then presents a modified genetic algorithm, called Adaptive Genetic Algorithm (AGA), to solve the problem. The performance of the proposed algorithm is evaluated by simulation studies on randomly generated instances and the results are compared with that obtained by applying both the Genetic Algorithm (GA) and the Simulated Annealing (SA). Also, the qualities of the results are compared with the optimal solutions that obtained by applying the Brach-and-Bound (BB) algorithm.
Article
Full-text available
The performance of a parallel application running on a Distributed Computing System (DCS) is basically affected by the distribution of workload over the various processors in the system. This problem is known to be NP-hard in most cases and complicates farther with increasing number of tasks and/or computers. This paper presents two heuristic algorithms; Simulated Annealing (SA) and Genetic Algorithm (GA), to solve the mentioned problem. The qualities of the resulting distribution are compared with that obtained by applying the Branch–and-Bound (BB) technique.
Conference Paper
Full-text available
Advances in microprocessors and computer networks have made distributed systems reality. However, exploiting the full potential of these systems requires efficient allocation of tasks comprising a distributed application to the available processors of the systems. This problem is known to be NP-hard and therefore untractable as soon as the number of tasks and/or processors exceeds a few units. This paper presents an optimal, memory efficient, algorithm for allocating an application program onto processors of a distributed system to minimize the program completion time. The algorithm derived from the well known Branch-and-Bound with some modifications to minimize its computational time. Some experimental results are given to show the effectiveness of the proposed algorithm.
Article
The scheduling problem is an important partially solved topic related to a wide range of scientific fields. As it applies to design-time mapping on multiprocessing platforms emphasizing on ordering in time and assignment in place, significant improvements can be achieved. To support this improvement, this article presents a complete systematic classification of the existing scheduling techniques solving this problem in a (near-)optimal way. We show that the proposed approach covers any global scheduling technique, including also future ones. In our systematic classification a technique may belong to one primitive class or to a hybrid combination of such classes. In the latter case the technique is efficiently decomposed into more primitive components each one belonging to a specific class. The systematic classification assists in the in-depth understanding of the diverse classes of techniques which is essential for their further improvement. Their main characteristics and structure, their similarities and differences, and the interrelationships of the classes are conceived. In this way, our classification provides guidance for contributing in novel ways to the broad domain of global scheduling techniques.
Conference Paper
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Hadoop is an open-source implementation of Map Reduce, enjoying wide adoption, and is used not only for batch jobs but also for short jobs where low response time is critical. However, Hadoop's performance is currently limited by its default task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumption does not always hold. Longest Approximate Time to End (LATE) is a Map Reduce scheduling algorithm that takes heterogeneous environments into consideration. It, however, adopts a static method to compute the progress of tasks. As a result neither Hadoop default nor LATE schedulers perform well in a heterogeneous environment. Self-adaptive Map Reduce Scheduling Algorithm (SAMR) uses historical information to adjust stage weights of map and reduce tasks when estimating task execution times. However, SAMR does not consider the fact that for different types of jobs their map and reduce stage weights may be different. Even for the same type of jobs, different datasets may lead to different weights. To this end, we propose ESAMR: an Enhanced Self-Adaptive Map Reduce scheduling algorithm to improve the speculative re-execution of slow tasks in Map Reduce. In ESAMR, in order to identify slow tasks accurately, we differentiate historical stage weights information on each node and divide them into k clusters using a k-means clustering algorithm and when executing a job's tasks on a node, ESAMR classifies the tasks into one of the clusters and uses the cluster's weights to estimate the execution time of the job's tasks on the node. Experimental results show that among the aforementioned algorithms, ESAMR leads to the smallest error in task execution time estimation and iden- ifies slow tasks most accurately.
Article
The collective behaviour observed in many social insects and animals provides the inspiration for the development of multi-vehicle control systems. The distributed nature of the multi-vehicle control problem enhances the performance of the collective system along the dimensions of scalability, robustness, and fault tolerance. The distributed/decentralized nature of the cooperative control task introduces many sub-problems often associated with network control design. In this paper, a survey of recent results in the field of cooperative control for multi-vehicle systems is presented. Various applications are discussed and presented in a mathematical framework to illustrate the major features of the cooperative control problem. Theoretical results for various cooperative control strategies are presented by topic and applied to the multi-vehicle applications.
Conference Paper
Full-text available
This paper presents many different parallel for- mulations of the A*/Branch-and-Bound search algorithm. The parallel formulations primarily differ in the data structures used. Some formula- tions are suited only for shared-memory architec- tures, whereas others are suited for distributed- memory architectures as well. These parallel for- mulations have been implemented to solve the vertex cover problem and the TSP problem on the BBN Butterfly parallel processor. Using ap- propriate data structures, we are able to obtain fairly linear speedups for as many as 100 pro- cessors. We also discovered problem characteris- tics that make certain formulations more (or less) suitable for some search problems. Since the best- first search paradigm of A*/Branch-and-Bound is very commonly used, we expect these parallel formulations to be effective for a variety of prob- lems. Concurrent and distributed priority queues used in these parallel formulations can be used in many parallel algorithms other than parallel A*/branch-and-bound.
Article
Full-text available
MULTIFIT-COM, a static task allocator which could be incorporated into an automated compiler/linker/loader for distributed processing systems, is presented. The allocator uses performance information for the processes making up the system in order to determine an appropriate mapping of tasks onto processors. It uses several heuristic extensions of the MULTIFIT bin-packing algorithm to find an allocation that will offer a high system throughput, taking into account the expected execution and interprocessor communication requirements of the software on the given hardware architecture. Throughput is evaluated by an asymptotic bound for saturated conditions and under an assumption that only processing resources are required. A set of options are proposed for each of the allocator's major steps. An evaluation was made on 680 small randomly generated examples. Using all the search options, an average performance difference of just over 1% was obtained. Using a carefully chosen small subset of only four options, a further degradation of just over 1.5% was obtained. The allocator is also applied to a digital signal processing system consisting of 119 tasks to illustrate its clustering and load balancing properties on a large system
Article
Full-text available
Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal node and solved by graph/tree search methods. Availability of parallel computers has created substantial interest in exploring parallel formulations of these graph and tree search methods. This article provides a survey of various parallel search algorithms such as Backtracking, IDA * , A * , Branch-and-Bound techniques and Dynamic Programming. It addresses issues related to load balancing, communication costs, scalability and the phenomenon of speedup anomalies in parallel search. INFORMS Journal on Computing, ISSN 1091-9856, was published as ORSA Journal on Computing from 1989 to 1995 under ISSN 0899-1499.
Article
We consider the problem of finding an optimal assignment of the modules of a program to processors in a distributed system. A module incurs an execution cost that may be different for each processor assignment, and modules that are not assigned to the same processor but that communicate with one another incur a communication cost. An optimal assignment minimizes the sum of the module execution costs and the intermodule communication costs. This problem is known to be NP-complete for more than three processors. Using a branch-and-bound-with-underestimates algorithm to reduce the size of the search tree, we evaluate its average time and space complexity for two underestimating functions through simulation. The more complex of the two functions, called the minimum independent assignment cost underestimate (MIACU), performs extremely well over a wide range of values of program model parameters such as the number of modules, the number of processors, and the ratio of average module execution cost to average intermodule communication cost. By reordering the list of modules to allow a subset of modules that do not communicate with one another to be assigned last, further improvements using MIACU are possible.
Article
A new mapping heuristic is developed, based on the recently proposed Mean Field Annealing (MFA) algorithm. An efficient implementation scheme, which decreases the complexity of the proposed algorithm by asymptotical factors, is also given. Performance of the proposed MFA algorithm is evaluated in comparison with two well-known heuristics: Simulated Annealing and Kernighan-Lin. Results of the experiments indicate that MFA can be used as an alternative heuristic for solving the mapping problem. The inherent parallelism of the MFA is exploited by designing an efficient parallel algorithm for the proposed MFA heuristic.
Article
Given an array of processors, the mapping problem requires assignment of program modules to processors such that the communication time between modules is minimized. Typically, this requires assignment of modules which intercommunicate to adjacent processors. Although polynomial solutions to certain instances of the mapping problem exist, an efficient, exact algorithm for the general-case problem has not been found. Consequently, researchers have concentrated on development of efficient heuristic solutions.This study explores the application of simulated annealing to the mapping problem. Performance of the annealing model will be compared with that of an existing heuristic procedure, and experimentation with various parameters of the annealing algorithm will be employed in determination of the most significant factors affecting the efficiency of the simulated annealing solution.
Article
The task assignment problem is one of assigning tasks of a parallel program among the processors of a distributed computing system in order to reduce the job turnaround time and to increase the throughput of the system. Since the task assignment problem is known to be NP-complete except in a few special situations, satisfactory suboptimal solutions obtainable in a reasonable amount of computation time are generally sought. In the paper we introduce a technique based on the problem-space genetic algorithm (PSGA) for the static task assignment problem in both homogeneous and heterogeneous distributed computing systems to reduce the task turnaround time and to increase the throughput of the system by properly balancing the load and reducing the interprocessor communication time among processors. The PSGA based approach combines the power of genetic algorithms, a global search method, with a simple and fast problem-specific heuristic to search a large solution space efficiently and effectively to find the best possible solution in an acceptable CPU time. Experimental results on test examples from the literature show considerable improvements in both the assignment cost and the CPU times over the previous work. The proposed scheme is also applied to a digital signal processing (DSP) system consisting of 119 tasks to illustrate its balancing properties and computational advantage on a large system. The proposed scheme offers 12–30% improvement in the assignment cost as compared to the previous best known results for the DSP example.
Conference Paper
In this paper, we present a technique based on the problem-space genetic algorithm (PSGA) for the static scheduling of directed acyclic graphs onto homogeneous multiprocessor systems to reduce the response-time. The PSGA based approach combines genetic algorithms, with a list scheduling heuristic to search a large solution space efficiently and effectively. Comparison of results with the genetic algorithm based scheduling technique for the Stanford manipulator and the Elbow manipulator examples shows a significant improvement in the response-time
Conference Paper
The objective of this research is to propose a low complexity static scheduling and allocation algorithm for message-passing architectures by considering factors such as communication delays, link contention, message routing and network topology. As opposed to the conventional list-scheduling approach, our technique works by first serializing the task graph and “injecting” all the tasks to one processor. The parallel tasks are then `bubbled up' to other processors and are inserted at appropriate time slots. The edges among the tasks are also scheduled by treating communication links between the processors as resources. The proposed approach takes into account the link contention and underlying communication routing strategy, and can self-adjust on regular as well as arbitrary network topologies. To reduce the complexity, our scheduling algorithm is itself parallelized. To our knowledge, this is the first attempt in designing a parallel algorithm for scheduling. The proposed approach implemented on an iPSC/860 hypercube, while yielding a high speedup in its execution, performs considerably better under a wide range of parameters including the task graph size, communication-to-computation ratio, and the target system topology. Comparisons are made with two other approaches
Conference Paper
C. C. Shen and W. H. Tsai (IEEE Trans. Comput., vol.C-34, no.3, p.197-203 1985) proposed a graph matching algorithm for solving the static task assignment problem. It combines two important ideas: (1) graph homomorphism and (2) application of the A* algorithm. Task-dependent information is used as a heuristic to reduce the search effort in finding an optimal path to the goal node. An examination is made of Shen and Tsai's strategy and their complexity measure. The authors propose some simple alternatives to their algorithm that are effective in reducing the number of nodes generated (and expanded) without sacrificing the optimality criteria
Article
We address the problem of optimally partitioning the modules of chain- or tree-like tasks over chain-structured or host-satellite multiple computer systems. This important class of problems includes many signal processing and industrial control applications. Prior research has resulted in a succession of faster exact and approximate algorithms for these problems. We describe polynomial exact and approximate algorithms for this class that are better than any of the previously reported algorithms. Our approach is based on a preprocessing step that condenses the given chain or tree structured task into a monotonic chain or tree. The partitioning of this monotonic task can then be carried out using fast search techniques
Article
In a distributed computing system a modular program must have its modules assigned among the processors so as to avoid excessive interprocessor communication while taking advantage of specific efficiencies of some processors in executing some program modules. In this paper we show that this program module assignment problem can be solved efficiently by making use of the well-known Ford–Fulkerson algorithm for finding maximum flows in commodity networks as modified by Edmonds and Karp, Dinic, and Karzanov. A solution to the two-processor problem is given, and extensions to three and n-processors are considered with partial results given without a complete efficient solution.
Article
A graph matching approach is proposed in this paper for solving the task assignment problem encountered in distributed computing systems. A cost function defined in terms of a single unit, time, is proposed for evaluating the effectiveness of task assignment. This cost function represents the maximum time for a task to complete module execution and communication in all the processors. A new optimization criterion, called the minimax criterion, is also proposed, based on which both minimization of interprocessor communication and balance of processor loading can be achieved. The proposed approach allows various system constraints to be included for consideration. With the proposed cost function and the minimax criterion, optimal task assignment is defined. Graphs are then used to represent the module relationship of a given task and the processor structure of a distributed computing system. Module assignment to system processors is transformed into a type of graph matching, called weak homomorphism. The search of optimal weak homomorphism corresponding to optimal task assignment is next formulated as a state-space search problem. It is then solved by the well-known A* algorithm in artificial intelligence after proper heuristic information for speeding up the search is suggested. An illustrative example and some experimental results are also included to show the effectiveness of the heuristic search.
Article
A task allocation model is presented that allocates application tasks among processors in distributed computing systems satisfying: 1) minimum interprocessor communication cost, 2) balanced utilization of each processor, and 3) all engineering application requirements. A cost function is formulated to measure the interprocessor communication and processing costs. By employing various constraints, the model efficiently generates minimum-cost allocation using a branch and bound technique. With suitable modification of these constraints, both application requirements and load balance can be achieved. For evaluation, this allocation model was applied to an Air Defence (AD) case study.
Article
In array processors it is important to map problem modules onto processors such that modules that communicate with each other lie, as far as possible, on adjacent processors. This mapping problem is formulated in graph theoretic terms and shown to be equivalent, in its most general form, to the graph isomorphism problem. The problem is also very similar to the bandwidth reduction problem for sparse matrices and to the quadratic assignment problem.
Article
The authors propose and evaluate an efficient hierarchical clustering and allocation algorithm that drastically reduces the interprocess communications cost while observing lower and upper bounds of utilization for the individual processors. They compare the algorithm with branch-and-bound-type algorithms that can produce allocations with minimal communication cost, and show a very encouraging time complexity/suboptimality tradeoff in favor of the algorithm, at least for a class of process clusters and their random combinations which it is believed occur naturally in distributed applications. The heuristic allocation is well suited for a changing environment, where processors may fail or be added to the system and where the workload patterns may change unpredictably and/or periodically
Article
This paper presents many different parallel formulations of the A*/Branch-and-Bound search algorithm. The parallel formulations primarily differ in the data structures used. Some formulations are suited only for shared-memory architectures, whereas others are suited for distributedmemory architectures as well. These parallel formulations have been implemented to solve the vertex cover problem and the TSP problem on the BBN Butterfly parallel processor. Using appropriate data structures, we are able to obtain fairly linear speedups for as many as 100 processors. We also discovered problem characteristics that make certain formulations more (or less) suitable for some search problems. Since the bestfirst search paradigm of A*/Branch-and-Bound is very commonly used, we expect these parallel formulations to be effective for a variety of problems. Concurrent and distributed priority queues used in these parallel formulations can be used in many parallel algorithms other than parallel A*/bra...
Parallel Search Algorithms for Discrete Optimization Problems Heterogeneous Computing Multiprocessor Scheduling with the Aid of Network Flow Algorithms
  • A Grama
  • Vipin Kumar
  • Yan
  • Li
A. Grama and Vipin Kumar, " Parallel Search Algorithms for Discrete Optimization Problems, " ORSA Joumal on Computing, vo1.7, no.4 (Fall 1995) and Yan A Li, " Heterogeneous Computing ", Parallel and Distributed Computing Handbook, pp. 725-7611, McGraw-Hill, New York. [ 171 H. S. Stone, " Multiprocessor Scheduling with the Aid of Network Flow Algorithms, " IEEE Trans. (on SofnYare Engineering, SE-3, vol. 1, pp. 85-93, Jan. 1977.
Genetic Algorithms in Search, Optimization, and Machine Learning Simulated Annealing and the Mapping Problem: A pp
  • D E Goldbergaddison
  • Wesely
  • Reading
  • S M Ma
  • S Hart
  • Chen
D. E. Goldberg, " Genetic Algorithms in Search, Optimization, and Machine Learning, " (Addison, Wesely, Reading, MA 1989) S. M. Hart and Chuen-Lung S. Chen, " Simulated Annealing and the Mapping Problem: A pp. 197-203, March 1992. [91 T(A*R) Computational Study, " Computers and Operations Research, vol. 21, no. 4, pp 455-461, 1994. M. A. Iqbal and S. H. Bokhari, " Efficient Algorithms for a Class of Partitioning Problems, " IEEE Trans. on
Simulated Annealing and the Mapping Problem: A pp. 197-203
  • M Hart
  • S Chuen-Lung
  • Chen
M. Hart and Chuen-Lung S. Chen, " Simulated Annealing and the Mapping Problem: A pp. 197-203, March 1992. [91 T(A*R)