Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In the last decade, due to emerging real-time and multimedia applications, there has been much interest for developing mechanisms to take into account the quality of service required by these applications. We have proposed earlier an approach used an adaptive algorithm for packet routing using reinforcement learning called K-optimal path Q-routing algorithm (KOQRA) which optimizes simultaneously two additive QoS criteria: cumulative cost path and end-to-end delay. The approach developed here adds a new module to KOQRA dealing with the packet scheduling topic in order to achieve QoS differentiation and to optimize the queuing delay in a dynamically wireless changing environment. This module uses a multi-agent system in which each agent tries to optimize its own behaviour and communicate with other agents to make global coordination possible. This communication is done by mobile agents. In this paper, we adopt the framework of Markov decision process applied to multi-agent system and present a pheromone-Q learning approach which combines the standard Q-learning technique with a synthetic pheromone that acts as a communication medium speeding up the learning process of cooperating agents. Numerical results obtained with OPNET simulator for different levels of trafficpsilas load show that adaptive scheduling improves clearly performances of our earlier KOQRA.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... With increasing number of explored states , , the exploration probability decreases to as time goes by. In [13], using the softmax approach, an agent select action based on a Boltzman distribution; specifically, the probability of selecting action in state is: ...
... On the other hand, the softmax approach ranks the exploration actions so that it does not explore actions that are far away from the optimal action. New action selection approaches have been proposed: enhanced -greedy approach in [8], [12]; and enhanced softmax approach in [13]. Nevertheless, each action selection technique has its merits and demerits in regards to the respective applications. ...
Article
Full-text available
The dynamicity of available resources and network conditions, such as channel capacity and traffic characteristics, have posed major challenges to scheduling in wireless networks. Reinforcement learning (RL) enables wireless nodes to observe their respective operating environment, learn, and make optimal or near-optimal scheduling decisions. Learning, which is the main intrinsic characteristic of RL, enables wireless nodes to adapt to most forms of dynamicity in the operating environment as time goes by. This paper presents an extensive review on the application of the traditional and enhanced RL approaches to various types of scheduling schemes, namely packet, sleep-wake and task schedulers, in wireless networks, as well as the advantages and performance enhancements brought about by RL. Additionally, it presents how various challenges associated with scheduling schemes have been approached using RL. Finally, we discuss various open issues related to RL-based scheduling schemes in wireless networks in order to explore new research directions in this area. Discussions in this paper are presented in a tutorial manner in order to establish a foundation for further research in this field.
Article
Full-text available
Discusses the need to develop a high quality control mechanism to check network traffic load for heterogeneous networks and ensure QoS requirements. The five articles in this special section are devoted to quality of service based routing algorithms for heterogeneous networks and are briefly summarized.
Article
Full-text available
We show that the problem of routing messages in a wireless sensor network so as to maximize network lifetime is NP-hard. In our model, the online model, each message has to be routed without knowledge of future route requests. We also develop an online heuristic to maximize network lifetime. Our heuristic, which performs two shortest path computations to route each message, is superior to previously published heuristics for lifetime maximization - our heuristic results in greater lifetime and its performance is less sensitive to the selection of heuristic parameters. Additionally, our heuristic is superior on the capacity metric
Article
Full-text available
We propose six new heuristics to find a source-to-destination path that satisfies two or more additive constraints on edge weights. Five of these heuristics become ε-approximation algorithms when their parameters are appropriately set. The performance of our new heuristics is compared experimentally with that of two recently proposed heuristics for the same problem.
Article
Full-text available
This paper describes and evaluates the Confidence-based Dual Reinforcement QRouting algorithm (CDRQ-Routing) for adaptive packet routing in communication networks. CDRQ-Routing is based on an application of the Q-learning framework to network routing, as first proposed by Littman and Boyan (1993). The main contribution of CDRQ-routing is an increased quantity and an improved quality of exploration. Compared to Q-Routing, the state-of-the-art adaptive Bellman-Ford Routing algorithm, and the non-adaptive shortest path method, CDRQ-Routing learns superior policies significantly faster. Moreover, the overhead due to exploration is shown to be insignificant compared to the improvements achieved, which makes CDRQ-Routing a practical method for real communication networks. 1 Introduction In a communication network information is transferred from one node to another as data packets [ Tanenbaum, 1989 ] . The process of sending a packet P (s; d) from its source node s to its destination node d ...
Conference Paper
We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multi-agent systems. Specifically, we focus on a novel action selection strategy for Q-learning. The new technique is applicable to scenarios where mutual observation of actions is not possible. To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.
Article
The paper presents the pheromone-Q-learning (Phe-Q) algorithm, a variation of Q-learning. The technique was developed to allow agents to communicate and jointly learn to solve a problem. Phe-Q learning combines the standard Q-learning technique with a synthetic pheromone that acts as a communication medium speeding up the learning process of cooperating agents. The Phe-Q update equation includes a belief factor that reflects the confidence an agent has in the pheromone (the communication medium) deposited in the environment by other agents. With the Phe-Q update equation, the speed of convergence towards an optimal solution depends on a number of parameters including the number of agents solving a problem, the amount of pheromone deposit, the diffusion into neighbouring cells and the evaporation rate. The main objective of this paper is to describe and evaluate the performance of the Phe-Q algorithm. The paper demonstrates the improved performance of cooperating Phe-Q agents over non-cooperating agents. The paper also shows how Phe-Q learning can be improved by optimizing all the parameters that control the use of the synthetic pheromone.
Article
Let G = (V, E) be a graph with weight function w:E rightarrow Z+ and length function l:E /rightarrow Z+. The problem of determining for v1, V2 /in V whether there is a path from v1 to v2 with weight at most W and length at most L is NP-complete. This paper gives two approaches to meeting or approximating the length and weight constraints. The first approach is to use a pseudopolynomial-time algorithm which determines whether a path meets the constraints. Its running time is O (n5b log nb) where n = |V| and b is the largest length or weight. If tables with O (n3b) entries are kept then all instances of multiple constraints may be decided. Table size may be substantially decreased if one is willing to tolerate incorrect answers to rare instances. The algorithm is suitable for distributed execution. In the second approach, an objective function is defined which evaluates a path's distance from meeting the constraints. Polynomial-time algorithms attempt to find good paths in terms of the objective function. One algorithm is at most 1.62 times worst than optimal. A notion of “average worst-case behavior” is defined. The algorithm's “average” behavior is 1.51 times worse than optimal.
Article
An important aspect of quality-of-service (QoS) provisioning in integrated networks is the ability to find a feasible route that satisfies a set of end-to-end QoS requirements (or constraints) while efficiently using network resources. In general, finding a path subject to multiple additive constraints (e.g., delay, delay-jitter) is an NP-complete problem. We propose an efficient randomized heuristic algorithm to this problem. Although the algorithm is presented as a centralized one, the idea behind it can also be implemented in a distributed manner. Our algorithm initially computes the cost of the best path from each node to a given destination with respect to individual link weights and their linear combination. In this initialization step, Reverse–Dijkstra's algorithm is used K+1 times per path request, where K is the number of additive constraints. The algorithm then randomly traverses the graph, discovering nodes from which there is a good chance of reaching the final destination. This randomized search is a modification of the breadth-first search (BFS). Using extensive simulations, we show that the proposed algorithm provides better performance than existing algorithms under the same order of computational complexity.
Conference Paper
Actually, various kinds of sources (such as voice, video or data) with diverse traffic characteristics and quality of service requirements (QoS), which are multiplexed at very high rates, leads to significant traffic problems such as packet losses, transmission delays, delay variations, etc, caused mainly by congestion in the networks. The prediction of these problems in real time is quite difficult, making the effectiveness of "traditional" methodologies based on analytical models questionable. This article proposed and evaluates a QoS routing policy in packets topology and irregular traffic of communications network called widest K-shortest paths Q-routing. The technique used for the evaluation signals of reinforcement is Q-learning. Compared to standard Q-routing, the exploration of paths is limited to K best non loop paths in term of hops number (number of routers in a path) leading to a substantial reduction of convergence time. In this work a proposal for routing which improves the delay factor and is based on the reinforcement learning is concerned. We use Q-learning as the reinforcement learning technique and introduce K-shortest idea into the learning process. The proposed algorithm are applied to two different topologies. The OPNET is used to evaluate the performance of the proposed algorithm. The algorithm evaluation is done for two traffic conditions, namely low load and high load.
Conference Paper
Actually, various kinds of sources (such as voice, video, or data) with diverse traffic characteristics and Quality of Service Requirements (QoS), which are multiplexed at very high rates, leads to significant traffic problems such as packet losses, transmission delays, delay variations, etc, caused mainly by congestion in the networks. The prediction of these problems in real time is quite difficult, making the effectiveness of “traditional” methodologies based on analytical models questionable. This article proposed and evaluates a QoS routing policy in packets topology and irregular traffic of communications network called K-shortest paths Q-Routing. The technique used for the evaluation signals of reinforcement is Q-learning. Compared to standard Q-Routing, the exploration of paths is limited to K best non loop paths in term of hops number (number of routers in a path) leading to a substantial reduction of convergence time. Moreover, each router uses an on line learning module to optimize the path in terms of average packet delivery time. The performance of the proposed algorithm is evaluated experimentally with OPNET simulator for different levels of load and compared to Q-Routing algorithm.
Article
In the context of modern high-speed internet network, routing is often complicated by the notion of guaranteed Quality of Service (QoS), which can either be related to time, packet loss or bandwidth requirements: constraints related to various types of QoS make some routing unacceptable. Due to emerging real-time and multimedia applications, efficient routing of information packets in dynamically changing communication network requires that as the load levels, traffic patterns and topology of the network change, the routing policy also adapts. We focused in this paper on QoS based routing by developing a neuro-dynamic programming to construct dynamic state-dependent routing policies. We propose an approach based on adaptive algorithm for packet routing using reinforcement learning called N best optimal path Q-routing algorithm (NOQRA) which optimizes two criteria: cumulative cost path (or hop count if each link cost=1) and end-to-end delay. A load balancing policy depending on a dynamical traffic path probability distribution function is also defined and embodied in NOQRA to characterize the distribution of the traffic over the N best paths. Numerical results obtained with OPNET simulator for different packet interarrival times statistical distributions with different levels of traffic’s load show that NOQRA gives better results compared to standard optimal path routing algorithms.
Article
We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multi-agent systems. Specifically, we focus on a novel action selection strategy for Q-learning. The new technique is applicable to scenarios where mutual observation of actions is not possible. To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.
Article
Finding a path in a network based on multiple constraints (the MCP problem) is often considered an integral part of quality of service (QoS) routing. QoS routing with constraints on multiple additive measures has been proven to be NP-complete. This proof has dramatically influenced the research community, resulting into the common belief that exact QoS routing is intractable in practice. However, to our knowledge, no one has ever examined which “worst cases” lead to intractability. In fact, the MCP problem is not strong NP-complete, suggesting that in practice an exact QoS routing algorithm may work in polynomial time. The goal of this paper is to argue that in practice QoS routing may be tractable. We will provide properties, an approximate analysis, and simulation results to indicate that NP-completeness hinges on four conditions, namely: 1) the topology; 2) the granularity of link weights; 3) the correlation between link weights; and 4) the constraints. We expect that, in practice, these conditions are manageable and therefore believe that exact QoS routing is tractable in practice.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Article
: The shortest path problem is a classical network programming problem that has been extensively studied. The problem of determining not only the shortest path, but also listing the K shortest paths (for a given integer K ? 1) is also a classical one but has not been studied so intensively, despite its obvious practical interest. Two different types of problems are usually considered: the unconstrained and the constrained K shortest paths problem. While in the former no restriction is considered in the definition of a path, in the constrained K shortest paths problem all the paths have to satisfy some condition -- for example, to be loopless. In this paper we are concerned with the unconstrained ranking shortest paths problem. In the first part the problem is viewed as a generalization of the classical shortest path problem; conditions for finiteness and for the optimality principle being satisfied are studied as well as labeling shortest path algorithms are generalized. In the second...
Article
Efficient routing of information packets in dynamically changing communication networks requires that as the load levels, traffic patterns and topology of the network change, the routing policy also adapts. Making globally optimal routing decisions would require a central observer/controller with complete information about the state of all nodes and links in the network, which is not realistic. Therefore, the routing decisions must be made locally by individual nodes (routers) using only local routing information. The routing information at a node could be estimates of packet delivery time to other nodes via its neighbors or estimates of queue lengths of other nodes in the network. An adaptive routing algorithm should efficiently explore and update routing information available at different nodes as it routes packets. It should continuously evolve efficient routing policies with minimum overhead on network resources. In this thesis, an on-line adaptive network routing algorithm called Confidence-based Dual Reinforcement Q-Routing (CDRQ-routing), based on the Q learning framework, is proposed and evaluated. In this framework, the routing information at individual nodes is maintained as Q value estimates of how long it will take to send a packet to any particular destination via each of the node's neighbors. These Q values are updated through exploration as the packets are transmitted. The main contribution of this work is the faster adaptation and the improved quality of routing policies over the Q-Routing. The improvement is based on two ideas. First, the quality of exploration is improved by including a confidence measure with each Q value representing how reliable the Q value is. The learning rate is a function of these confidence values. Secondly, the quantity of exploration is increased by including backward exploration into Q learning. As a packet hops from one node to another, it not only updates a Q value in the sending node (forward exploration similar to Q-Routing), but also updates a Q value in the receiving node using the information appended to the packet when it is sent out (backward exploration). Thus two Q value updates per packet hop occur in CDRQ-Routing as against only one in {sc Q-routing}. Certain properties of forward and backward exploration that form the basis of these update rules are stated and proved in this work. Experiments over several network topologies, including a 36 node irregular grid and 128 node 7-D hypercube, indicate that the improvement in quality and increase in quantity of exploration contribute in complementary ways to the performance of the overall routing algorithm. CDRQ-Routing was able to learn optimal shortest path routing at low loads and efficient routing policies at medium loads almost twice as fast as Q-Routing. At high load levels, the routing policy learned by CDRQ-Routing was twice as good as that learned by Q-Routing in terms of average packet delivery time. CDRQ-Routing was found to adapt significantly faster than Q-Routing to changes in network traffic patterns and network topology. The final routing policies learned by CDRQ-Routing were able to sustain much higher load levels than those learned by Q-Routing. Analysis shows that exploration overhead incurred in CDRQ-Routing is less than 0.5% of the packet traffic. Various extensions of CDRQ-Routing namely, routing in heterogeneous networks (different link delays and router processing speeds), routing with adaptive congestion control (in case of finite queue buffers) and including predictive features into CDRQ-Routing have been proposed as future work. CDRQ-Routing is much superior and realistic than the state of the art distance vector routing and the Q-Routing algorithm.
Impact of Adaptive Quality of Service Based Routing Algorithms in the next generation heterogeneous networks
  • A Mellouk
  • P Lorenz
  • A Boukerche
  • M H Lee
A. Mellouk, P. Lorenz, A. Boukerche, M.H. Lee, "Impact of Adaptive Quality of Service Based Routing Algorithms in the next generation heterogeneous networks", IEEE Communication Magazine, IEEE Press, vol. 45, n02, Feb. 2007, pp. 65-66.
The K shortest paths problem
  • E Q V Martins
  • M M B Pascal
  • J L E Santos
A QoS-based scheduling by Neurodynamic Learning
  • M Bourenane
  • A Mellouk
  • D Benhamamouche
Bourenane M, Mellouk A., Benhamamouche D., A QoS-based scheduling by Neurodynamic Learning, In System and Information Sciences Journal, Vol. 2, nO 2, 2007, pp 138-144.
Computers and Intractability: A Guide to the Theory of NP-Completeness
  • M R Garey
  • D S Jhonson
M. R. Garey, D. S. Jhonson, "Computers and Intractability: A Guide to the Theory of NP-Completeness", Freeman, San Francisco, 1979.
Confidence based dual reinforcement Q-routing: An adaptive online network routing algorithm
  • S Kumar
  • R Miikkualainen
S.Kumar and R Miikkualainen, "Confidence based dual reinforcement Q-routing: An adaptive online network routing algorithm", Proc. of the Sixteenth International Joint Conference on Artificial IntelligenceT (IJCAI-99, Sweden, Stockholm), San Francisco, CA: Kaufmann, 1999, pp. 758-763.