Figure 2 - uploaded by Piotr Gawlowicz
Content may be subject to copyright.
Architecture of ns3-gym framework.

Architecture of ns3-gym framework.

Source publication
Conference Paper
Full-text available
Recently, we have seen a boom of attempts to improve the operation of networking protocols using machine learning techniques. The proposed reinforcement learning (RL) based control solutions very often overtake traditionally designed ones in terms of performance and efficiency. However, in order to reach such a superb level, an RL control agent req...

Contexts in source publication

Context 1
... architecture of ns3-gym as depicted in Fig. 2 consists of the following major components, namely: ns-3 network simulator and OpenAI Gym framework. The former one is used to implement environments, while the latter one unifies their interface. The main contribution of this work is the design and implementation of a generic interface between OpenAI Gym and ns-3 that allows for ...
Context 2
... typical workflow of developing and training an RL-based agent is shown as numbers in Fig. 2: (1) Create a model of the network and configure scenario conditions (i.e. traffic, mobility, etc.) using standard functions of ns-3; (2) Instantiate ns3-gym environment gateway in the simulation, i.e. create OpenGymGateway object and implement callbacks functions that collect a state of the environment to be shared with the agent and ...
Context 3
... architecture of ns3-gym as depicted in Fig. 2 consists of the following major components, namely: ns-3 network simulator and OpenAI Gym framework. The former one is used to implement environments, while the latter one unifies their interface. The main contribution of this work is the design and implementation of a generic interface between OpenAI Gym and ns-3 that allows for ...
Context 4
... typical workflow of developing and training an RL-based agent is shown as numbers in Fig. 2: (1) Create a model of the network and configure scenario conditions (i.e. traffic, mobility, etc.) using standard functions of ns-3; (2) Instantiate ns3-gym environment gateway in the simulation, i.e. create OpenGymGateway object and implement callbacks functions that collect a state of the environment to be shared with the agent and ...

Similar publications

Preprint
Full-text available
The intelligent transportation system is necessary for smart connection among vehicles and roadways equipment. VANET is an emerging research area and gaining attention for this smart connection. It is a subclass of MANET, where each auto-mobile is a node in the ad-hoc network which is consists of groups of stationary or moving vehicles. These vehic...
Article
Full-text available
Vehicles in the Internet of Vehicles (IoV) create wireless connections and participate in routing by sending information to other nodes. Despite the recent growth in popularity of IoV, there are still issues with high vehicle speeds, frequent interruptions, and a dynamic topology, which makes the creation of efficient routing protocols that are mor...
Conference Paper
Full-text available
Millimeter wave (mmWave) communication has recently attracted significant attention from both industrial and academic communities. The large bandwidth availability as well as low interference nature of mmWave spectrum is particularly attractive for industrial communication. However, inherent challenges such as coverage and blockage of mmWave commun...
Conference Paper
Full-text available
IEEE 801.11 Working Group is developing the sixth generation of wireless local area network (WLAN) standards, denoted as 802.11ax. The main goal of the new standard is to improve 2,4GHz and 5GHz network performance, in scenarios with multiple overlapping and dense basic service sets (BSSs). In this paper we highlight two novel solutions proposed in...
Article
Full-text available
The emergence of millimeter-wave based technologies is pushing the deployment of the 5th generation of mobile communications (5G), on the potential to achieve multi-gigabit and low-latency wireless links. Part of this breakthrough was only possible with the introduction of small antenna arrays, capable to form highly directional and electronically...

Citations

... Formerly known as NR-Lena, this module is integrated into the ns-3 simulator, offering simulations for 3GPP NR non-standalone cellular networks. Furthermore, we employ the ns-3 gym [49] module as an interface between ns-3 and Pythonbased agents. Tables II and III, respectively. ...
Preprint
Full-text available
Extended Reality (XR) services are set to transform applications over 5th and 6th generation wireless networks, delivering immersive experiences. Concurrently, Artificial Intelligence (AI) advancements have expanded their role in wireless networks, however, trust and transparency in AI remain to be strengthened. Thus, providing explanations for AI-enabled systems can enhance trust. We introduce Value Function Factorization (VFF)-based Explainable (X) Multi-Agent Reinforcement Learning (MARL) algorithms, explaining reward design in XR codec adaptation through reward decomposition. We contribute four enhancements to XMARL algorithms. Firstly, we detail architectural modifications to enable reward decomposition in VFF-based MARL algorithms: Value Decomposition Networks (VDN), Mixture of Q-Values (QMIX), and Q-Transformation (Q-TRAN). Secondly, inspired by multi-task learning, we reduce the overhead of vanilla XMARL algorithms. Thirdly, we propose a new explainability metric, Reward Difference Fluctuation Explanation (RDFX), suitable for problems with adjustable parameters. Lastly, we propose adaptive XMARL, leveraging network gradients and reward decomposition for improved action selection. Simulation results indicate that, in XR codec adaptation, the Packet Delivery Ratio reward is the primary contributor to optimal performance compared to the initial composite reward, which included delay and Data Rate Ratio components. Modifications to VFF-based XMARL algorithms, incorporating multi-headed structures and adaptive loss functions, enable the best-performing algorithm, Multi-Headed Adaptive (MHA)-QMIX, to achieve significant average gains over the Adjust Packet Size baseline up to 10.7%, 41.4%, 33.3%, and 67.9% in XR index, jitter, delay, and Packet Loss Ratio (PLR), respectively.
... [23] scenarios can be customized, but each requires a non-trivial implementation of a scenario class for each testing scenario. Similarly, [24] requires a custom C++ interface to be written for each scenario. [25] provides a wide variety of premade scenarios of incumbents (similar to entities). ...
... RL Package Compatibility: Ability to interface with external RL libraries for algorithms and optimization methods. [23], [24], and the RFRL Gym utilize OpenAI's Gym API [26]. Because of this, these environments interface with various libraries including Stable-Baselines [27], PettingZoo [28], etc. [25] is not similarly integrated, as models are primarily Multi-Agent Capability: The newest version of the RFRL Gym environment features the addition of multi-agent capabilities, meaning that it can support the concurrent training of multiple intelligent agents in one scenario. ...
... Ease of Use: Tool is designed to simplify the training and analysis process of RL algorithms, especially if this functionality is accessible to a wide audience. [23] and [24] both require understanding of an external resource (GNU Radio and ns-3 respectively). GNU Radio is not officially supported on Windows and [23] does not offer rendering. ...
Preprint
Full-text available
Technological trends show that Radio Frequency Reinforcement Learning (RFRL) will play a prominent role in the wireless communication systems of the future. Applications of RFRL range from military communications jamming to enhancing WiFi networks. Before deploying algorithms for these purposes, they must be trained in a simulation environment to ensure adequate performance. For this reason, we previously created the RFRL Gym: a standardized, accessible tool for the development and testing of reinforcement learning (RL) algorithms in the wireless communications space. This environment leveraged the OpenAI Gym framework and featured customizable simulation scenarios within the RF spectrum. However, the RFRL Gym was limited to training a single RL agent per simulation; this is not ideal, as most real-world RF scenarios will contain multiple intelligent agents in cooperative, competitive, or mixed settings, which is a natural consequence of spectrum congestion. Therefore, through integration with Ray RLlib, multi-agent reinforcement learning (MARL) functionality for training and assessment has been added to the RFRL Gym, making it even more of a robust tool for RF spectrum simulation. This paper provides an overview of the updated RFRL Gym environment. In this work, the general framework of the tool is described relative to comparable existing resources, highlighting the significant additions and refactoring we have applied to the Gym. Afterward, results from testing various RF scenarios in the MARL environment and future additions are discussed.
... Due to the Gym's easy interface, it is commonly used by different ML frameworks. The ns-3 simulator framework is integrated with the Gym using the ns3-gym environment [26]. We have incorporated different jamming strategies by expanding the sweeping jammer code that was proposed in the ns3-gym library and making our adaptive jamming pattern module to develop jammers with different jamming behaviors. ...
Preprint
Full-text available
Deep Reinforcement Learning (DRL) has been highly effective in learning from and adapting to RF environments and thus detecting and mitigating jamming effects to facilitate reliable wireless communications. However, traditional DRL methods are susceptible to catastrophic forgetting (namely forgetting old tasks when learning new ones), especially in dynamic wireless environments where jammer patterns change over time. This paper considers an anti-jamming system and addresses the challenge of catastrophic forgetting in DRL applied to jammer detection and mitigation. First, we demonstrate the impact of catastrophic forgetting in DRL when applied to jammer detection and mitigation tasks, where the network forgets previously learned jammer patterns while adapting to new ones. This catastrophic interference undermines the effectiveness of the system, particularly in scenarios where the environment is non-stationary. We present a method that enables the network to retain knowledge of old jammer patterns while learning to handle new ones. Our approach substantially reduces catastrophic forgetting, allowing the anti-jamming system to learn new tasks without compromising its ability to perform previously learned tasks effectively. Furthermore, we introduce a systematic methodology for sequentially learning tasks in the anti-jamming framework. By leveraging continual DRL techniques based on PackNet, we achieve superior anti-jamming performance compared to standard DRL methods. Our proposed approach not only addresses catastrophic forgetting but also enhances the adaptability and robustness of the system in dynamic jamming environments. We demonstrate the efficacy of our method in preserving knowledge of past jammer patterns, learning new tasks efficiently, and achieving superior anti-jamming performance compared to traditional DRL approaches.
... The mmWave module was used without modification to collect beam pair data proposed in the proposed method section. To validate the ML-aided solution for optimizing beam pair selection and its update period, we employed the NS3-Gym module 32 to establish the ML deployment infrastructure, enabling real-time interaction between the ML model and the simulator. The NS3-Gym module operates via sockets implemented with ZeroMQ 32 . ...
Article
Full-text available
Finding the optimal beam pair and update time in 5G systems operating at mmWave frequencies is time-intensive and resource-demanding. This intricate procedure calls for the proposal of more intelligent approaches. Therefore, this work proposes a machine learning-based method for optimizing beam pair selection and its update time. The method is structured around three main modules: spatial characterization of beam pair service areas, training of a machine learning model using collected beam pair data, and an algorithm that uses the decision function of the trained model to compute the optimal update time for beam pairs based on the spatial position and velocity of user equipment. When the machine learning model is deployed in a network with a single gNB equipped with a 8×88×88\times 8 UPA and one UE equipped with a 1×21×21\times 2 UPA in an mmWave scenario simulated in NS3, improvements in SINR and throughput up to 407%407%407\%, were observed. Improvements are gathered because of a reduction of 85.7%85.7%85.7\% in beam pair selections because of an increase of approximately 1543%1543%1543\% in the effective time between successive beam pair searches. This method could offer real-time optimization of the beam pair procedures in 5G networks and beyond.
... For example, RL was used for automated incident handling against network-based attack [9]. Galowicz et al. [10], trained a communication network controller to take action based on signals such as signal-to-noise ratio thresholds. ...
Preprint
Full-text available
While inverter-based distributed energy resources (DERs) play a crucial role in integrating renewable energy into the power system, they concurrently diminish the grid's system inertia, elevating the risk of frequency instabilities. Furthermore, smart inverters, interfaced via communication networks, pose a potential vulnerability to cyber threats if not diligently managed. To proactively fortify the power grid against sophisticated cyber attacks, we propose to employ reinforcement learning (RL) to identify potential threats and system vulnerabilities. This study concentrates on analyzing adversarial strategies for false data injection, specifically targeting smart inverters involved in primary frequency control. Our findings demonstrate that an RL agent can adeptly discern optimal false data injection methods to manipulate inverter settings, potentially causing catastrophic consequences.
... The environment was constructed in ns-3 to provide the founda-tional framework for the scenario, supplying the agent with initial and essential specifications. Additionally, RLTOPA was trained and evaluated using ns3-gym [50], which serves as an interface between the environment and the agent. By employing the ns-3 network simulation as a basis, ns3-gym facilitates the creation of an OpenAIGymRL environment and oversees the dynamic wireless networking environment employed for agent training and evaluation. ...
Preprint
Full-text available
Unmanned Aerial Vehicles (UAVs) are suited as cost-effective and adaptable platforms for carrying Wi-Fi Access Points (APs) and cellular Base Stations (BSs). Implementing aerial networks in disaster management scenarios and crowded areas can effectively enhance Quality of Service (QoS). In such environments, maintaining Line-of-Sight (LoS), especially at higher frequencies, is crucial for ensuring reliable communication networks with high capacity, particularly in environments with obstacles. The main contribution of this paper is a traffic- and obstacle-aware UAV positioning algorithm named Reinforcement Learning-based Traffic and Obstacle-aware Positioning Algorithm (RLTOPA), for such environments. RLTOPA determines the optimal position of the UAV by considering the positions of ground users, the coordinates of obstacles, and the traffic demands of users. This positioning aims to maximize QoS in terms of throughput by ensuring optimal LoS between ground users and the UAV. The network performance of the proposed solution, characterized in terms of mean delay and throughput, was evaluated using the ns- 3 simulator. The results show up to 95% improvement in aggregate throughput and 71% in delay without compromising fairness.
... The mmWave module was used with no modification for the collection of beam pair data proposed in the proposed method section. In order to validate the ML-aided solution for optimizing beam pair selection and its update period, we employed the NS3-Gym module 24 to establish the ML deployment infrastructure, enabling real-time interaction between the ML model and the simulator. The NS3-Gym module operates via sockets implemented with ZeroMQ 24 . ...
Preprint
Full-text available
Finding the optimal beam pair and update time in 5G systems operating at mmWave frequencies is time-intensive and resource-demanding. This intricate procedure calls for the proposal of more intelligent approaches. Therefore, this work proposes a machine learning-based method for optimizing beam pair selection and its update time. The method is structured around three main modules: spatial characterization of beam pair service areas, training of a machine learning model using collected beam pair data, and an algorithm that uses the decision function of the trained model to compute the optimal update time for beam pairs based on the spatial position and velocity of user equipment. When the machine learning model is deployed in the network comprising one single gNB and one single user equipment in an mmWave scenario, improvement in SINR and throughput up to 4% are observed. Improvements are gathered because of a reduction of 87.5% in beam pair selections because of an increase of approximately 2330% in the effective time between successive beam pair searches. This method could offer real-time optimization of the beam pair procedures in 5G networks and beyond.
... The previous module known also as NR-Lena is built on top of the ns-3 simulator and provides simulation for 3GPP NR non-standalone cellular networks. In addition, we use as an interface between the ns-3 and the Python-based agents the module ns-3 gym [18]. ...
Preprint
Full-text available
Extended Reality (XR) services will revolutionize applications over 5th and 6th generation wireless networks by providing seamless virtual and augmented reality experiences. These applications impose significant challenges on network infrastructure, which can be addressed by machine learning algorithms due to their adaptability. This paper presents a Multi- Agent Reinforcement Learning (MARL) solution for optimizing codec parameters of XR traffic, comparing it to the Adjust Packet Size (APS) algorithm. Our cooperative multi-agent system uses an Optimistic Mixture of Q-Values (oQMIX) approach for handling Cloud Gaming (CG), Augmented Reality (AR), and Virtual Reality (VR) traffic. Enhancements include an attention mechanism and slate-Markov Decision Process (MDP) for improved action selection. Simulations show our solution outperforms APS with average gains of 30.1%, 15.6%, 16.5% 50.3% in XR index, jitter, delay, and Packet Loss Ratio (PLR), respectively. APS tends to increase throughput but also packet losses, whereas oQMIX reduces PLR, delay, and jitter while maintaining goodput.
... The propagation loss follows the Log-Distance propagation loss model with a constant speed propagation delay. We implement our proposed solutions in ns-3, and we also use OpenAI Gym to interface between ns-3 and the MA-MAB solution [54]. To ensure the validity of the proposed algorithms, we conduct simulations using various seed values, resulting in random deployment positions for all users and affecting the traffic dynamics in ns-3. ...
Article
Full-text available
The exponential increase in the demand for high-performance services such as streaming video and gaming by wireless devices has posed several challenges for Wireless Local Area Networks (WLANs). In the context of Wi-Fi, the newest standards, IEEE 802.11ax, and 802.11be, bring high data rates in dense user deployments. Additionally, they introduce new flexible features in the physical layer, such as dynamic Clear-Channel-Assessment (CCA) thresholds, to improve spatial reuse (SR) in response to radio spectrum scarcity in dense scenarios. In this paper, we formulate the Transmission Power (TP) and CCA configuration problem with the objective of maximizing fairness and minimizing station starvation. We present five main contributions to distributed SR optimization using Multi-Agent Multi-Armed Bandits (MA-MABs). First, we provide regret analysis for the distributed Multi-Agent Contextual MABs (MA-CMABs) proposed in this work. Second, we propose reducing the action space given the large cardinality of action combinations of TP and CCA threshold values per Access Point (AP). Third, we present two deep MA-CMAB algorithms, named Sample Average Uncertainty (SAU)-Coop and SAU-NonCoop, as cooperative and non-cooperative versions to improve SR. Additionally, we analyze the viability of using MA-MABs solutions based on the ϵ-greedy, Upper Bound Confidence (UCB), and Thompson (TS) techniques. Finally, we propose a deep reinforcement transfer learning technique to improve adaptability in dynamic environments. Simulation results show that cooperation via the SAU-Coop algorithm leads to a 14.7% improvement in cumulative throughput and a 32.5% reduction in Packet Loss Rate (PLR) in comparison to non-cooperative approaches. Under dynamic scenarios, transfer learning mitigates service drops for at least 60% of the total users.
... We have implemented the meta-caching strategy in a modified version of the ndnSIM simulator [22], [23] coupled with the ns3-gym framework [24]. The ndnSIM is an opensource NDN simulator to reproduce discrete-event network scenarios, and the ns3-gym framework is designed to support the interaction of machine learning agents with the network environment. ...
Article
Full-text available
In-network cache architectures, such as Information-centric networks (ICNs), have proven to be an efficient alternative to deal with the growing content consumption on networks. In caching networks, any device can potentially act as a caching node. In practice, real cache networks may employ different caching replacement policies by a node. The reason is that the policies may vary in efficiency according to unbounded context factors, such as cache size, content request pattern, content distribution popularity, and the relative cache location. The lack of suitable policies for all nodes and scenarios undermines the efficient use of available cache resources. Therefore, a new model for choosing caching policies appropriately to cache contexts on-demand and over time becomes necessary. In this direction, we propose a new caching meta-policy strategy capable of learning the most appropriate policy for cache online and dynamically adapting to context variations that leads to changes in which policy is best. The meta-policy decouples the eviction strategy from managing the context information used by the policy, and models the choice of suitable policies as online learning with a bandit feedback problem. The meta-policy supports deploying a diverse set of self-contained caching policies in different scenarios, including adaptive policies. Experimental results with single and multiple caches have shown the meta-policy effectiveness and adaptability to different content request models in synthetic and trace-driven simulations. Moreover, we compared the meta-policy adaptive behavior with the Adaptive Replacement Policy (ARC) behavior.