ArticlePDF Available

Q-learning based energy-efficient and void avoidance routing protocol for underwater acoustic sensor networks


Abstract and Figures

The routing in underwater acoustic sensor networks (UASNs) has become a challenging issue due to several problems. First, in UASN, the distance between the nodes changes due to their mobility with the water current, thus increasing the network's energy consumption. Second problem in UASNs is the occurrence of the void hole, which affects the network's performance. Because nodes are unable to deliver data towards the destination due to the absence of forwarder nodes (FNs) in the network. Thus, the objective of routing in UASNs is to overcome the issues mentioned earlier to prolong the network's lifetime. Therefore, a Q-learning based energy-efficient and balanced data gathering (QL-EEBDG) routing protocol is proposed in this paper. In QL-EEBDG, the FNs are selected according to their residual energy and grouped according to their neighboring nodes' energies. Using energy as the main selection parameter assures efficient energy consumption in the network. Moreover, efficient selection of the FNs increases the lifetime of the network. However, the void node recovery process fails when the topology of the network is changed. Therefore, to avoid void holes in QL-EEBDG, a QL-EEBDG adjacent node (QL-EEBDG-ADN) scheme is proposed. It finds alternate neighbor routes for packet transmission and ensures continuous communication in the network. Extensive simulations are carried out for the performance evaluation of the proposed technique with existing baseline protocols, namely efficient balanced energy consumption based data gathering (EBDG), enhanced EBDG (EEBDG) and QELAR. The performance parameters used in the simulations are network lifetime, energy tax, network stability period and packet delivery ratio (PDR). The simulation results depict that the proposed QL-EEBDG-ADN outperforms the baseline protocols by approximately 11% better PDR and 25% better energy tax.
Content may be subject to copyright.
Q- learning based energy-efficient and void avoidance routing
protocol for underwater acoustic sensor networks
Zahoor Ali Khana,,Obaida Abdul Karimb,Shahid Abbasc,Nadeem Javaidc,
Yousaf bin Zikriad,and Usman Tariqe
aCIS, Higher Colleges of Technology, Fujairah 4114, United Arab Emirates
bInternational Islamic University, Islamabad, 44000, Pakistan
cDepartment of Computer Science, COMSATS University Islamabad, Islamabad 44000, Pakistan
dYeungnam University, Gyeongsan 38541, South Korea
eCollege of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Khraj 11942, Saudi Arabia
energy consumption
network lifetime
routing protocol
underwater acoustic sensor networks
void hole detection.
The routing in underwater acoustic sensor networks (UASNs) has become a challenging issue
due to several problems. First, in UASN, the distance between the nodes changes due to their
mobility with the water current, thus increasing the network’s energy consumption. Second
problem in UASNs is the occurrence of the void hole, which affects the network’s performance.
Because nodes are unable to deliver data towards the destination due to the absence of forwarder
nodes (FNs) in the network. Thus, the objective of routing in UASNs is to overcome the
issues mentioned earlier to prolong the network’s lifetime. Therefore, a Q-learning based
energy-efficient and balanced data gathering (QL-EEBDG) routing protocol is proposed in this
paper. In QL-EEBDG, the FNs are selected according to their residual energy and grouped
according to their neighboring nodes’ energies. Using energy as the main selection parameter
assures efficient energy consumption in the network. Moreover, efficient selection of the FNs
increases the lifetime of the network. However, the void node recovery process fails when the
topology of the network is changed. Therefore, to avoid void holes in QL-EEBDG, a QL-
EEBDG adjacent node (QL-EEBDG-ADN) scheme is proposed. It finds alternate neighbor
routes for packet transmission and ensures continuous communication in the network. Extensive
simulations are carried out for the performance evaluation of the proposed technique with
existing baseline protocols, namely efficient balanced energy consumption based data gathering
(EBDG), enhanced EBDG (EEBDG) and QELAR. The performance parameters used in the
simulations are network lifetime, energy tax, network stability period and packet delivery ratio
(PDR). The simulation results depict that the proposed QL-EEBDG-ADN outperforms the
baseline protocols by approximately 11% better PDR and 25% better energy tax.
1. Introduction
In underwater acoustic sensor networks (UASNs), the number of sensor nodes is limited. The nodes are deployed on
a particular area to monitor operations including marine life exploration, in-depth study of the water bodies, oil and gas
leakage examination, etc. In addition, the sensor nodes are application-specific and they sense the data from the aquatic
ecosystem. The sensed information is then transmitted to the base station through different routing strategies, which
are broadly classified into two categories: direct and multi-hop [1]. The sensor nodes are very important because they
act as backbone for UASNs. However, various issues exist regarding the sensor nodes, such as limited battery capacity,
constrained memory storage and computational power. These issues produces different challenges during the design
process of protocols for underwater routing and sensing [2]. In order to tackle the aforementioned issues, energy-aware
routing algorithms have gained popularity due to adaptive feature for balancing the node energy dissipation [3].
One of the main problems in UASNs is the presence of a void hole, which increases the energy consumption of
the network [4]. Because nodes are unable to deliver the data towards the next-hop forwarder nodes (FNs) due to the
unavailability of the relay nodes in the network. To overcome the problem, nodes increase their transmission power
level fortransmitting the data towards the next hop. The energy depletion of the nodes’ increases when the transmission
power level is high, thus reducing the network’s performance. Another major problem in UASNs is the selection of FNs
Principal corresponding authors (Z.A. Khan); (Y. Zikria)
First Author et al.: Preprint submitted to Elsevier Page 1 of 18
Table 1
List of Acronyms
Notation Description
AUVs autonomous underwater vehicles
Ack acknowledgment
ADN adjacent nodes
BER bit error rate
DQELR deep Q-network based energy and latency-aware
routing protocol
DL-HDBT deep learning-high dynamic biased track
EBDG efficient balanced energy consumption based data
EEBDG enhanced EBDG
FN forwarder node
HN helper node
MARLIN-Q multi-model reinforcement learning based routing
with quality of service
PDR packet delivery ratio
QL Quality learning
QLKS QL based kinematics and sweeping
RCAR reinforcement learning based congestion avoided
Ropt optimal transmission range
SN source node
UWSNs underwater wireless sensor networks
IUoT internet of underwater things
UASNs underwater acoustic sensor networks
Table 2
List of symbols used in the equations
Variable Description
𝐸average residual energy
𝛾discount factor
𝐸𝑜initial energy
𝐸𝑟residual energy
A set of actions
𝑐fraction of used energy
𝑔(𝑝𝑠)group of adjacent FNs’ reward
S set of states
𝑃𝑠probability of data packet sent successfully
𝑝𝑟packet receive
𝑝𝑓packet not sent successfully
𝑇𝑟𝑎𝑛𝑔𝑒 transmission range
𝛼weighting parameters
𝛽weighting parameters
for reliable data delivery. The problem occurs due to nodes’ mobility and the high bit error rate (BER) in UASNs [5].
Besides, the unbalancing energy problem in the network also affects the performance of the sensor nodes because they
die out quickly due to high data load leaving behind the communication gap. Thus, improving the network performance
considering the aforementioned problems in the environment requires extensive efforts.
First Author et al.: Preprint submitted to Elsevier Page 2 of 18
1.1. Problem Statement
In the UASNs, the existence of a void holes is usual phenomena due to water currents and sensor nodes’ mobility.
The mobility of the sensor nodes increases the energy consumption and decreases the network lifetime [6]. The
mobility also effects the data gathering in the UASN [7]. The authors in [8] propose an energy-balanced data gathering
(EBDG) model to minimize the energy utilization and increase the network lifetime. The proposed model utilizes the
motives of corona based network division and mixed-routing strategies with data gathering. The authors in [9] use an
efficient EEBDG (EEBDG) protocol to increase the lifetime of underwater sensor networks (UWSNs). An optimal
transmission range (Ropt) is selected for communication between the sensors nodes using a hybrid transmission.
However, both aforementioned conventional routing protocols are architecture-dependent, which means that they
require prior information. Furthermore, to avoid the void hole, sensor nodes increase their transmission power to
increase the range, which results in extra energy consumption. Also, efficient FNs are needed in the neighbor of the
source node. In [8,9], energy-balanced approaches are presented based on shortest path selection. For instance, in
multi-hop transmission, the source node (SN) selects the closest FN to transmit the data packet. However, an FN is
selected in an immutable manner that causes rapid drainage of the node’ s battery, which eventually results in the death
of the node and network failure.
To tackle the above-mentioned issues, the energy-efficient and void avoidance routing algorithms have gained
much attention from the authors. The algorithms must ensure void avoidance through immutable selection of FNs, as
it increases the network lifetime and balances energy consumption [10]. In this regard, Q-learning (QL) is employed,
which significantly reduces the network overhead and energy consumption. QL makes use of the hybrid strategy,
comprising of both reactive and proactive strategies, to reduce the control overhead. In the former strategy, the neighbor
nodes and the associated routes are found by broadcasting a control packet within the network. The information about
nodes and routes is stored in a regularly updated table. However, the frequent usage of the strategy leads to node’
s battery depletion. In the latter strategy, the control message is broadcasted when the network topology changes.
Thereby, it reduces the network overhead and memory storage [2]. QL learns from the unknown environment based
on the rewards to maintain a balance between proactive and reactive strategies.
Moreover, QL helps in optimizing an agent’ s behavior and preventing excessive use of energy. However, the
optimization of agent’ s behavior depends upon the routing protocol’ s requirements. Furthermore, it is assessed that the
optimization of reward parameter is linked with the network lifetime. It means high optimization results in increasing
the network lifetime [13].
In the proposed work, the control packets are broadcasted to only next hop. The nodes only keep their neighboring
nodes’ information and are not concerned about the routes’ information. Thus, the proposed routing protocols: Q-
learning based EEBDG (QL-EEBDG) and QL-EEBDG adjacent node (QL-EEBDG-ADN), overcome the drawbacks
and only possess both strategies’ benefits QL: reactive and proactive. Moreover, the proposed routing protocols have
the feature of avoiding the sudden depletion of nodes by automatically directing the forwarding nodes to alternate
routes. This switching of nodes increases the lifespan of the network and balances the energy consumption. This work
is an extension of [12] and its main contributions are listed below.
1.2. Contributions
The contributions of the work are sixfold.
We propose the void hole detection mechanisms, named as QL-EEBG and QL-EEBDG-ADN.
The proposed routing mechanisms automatically redirect the path from energy depleted node to another path.
Exploit the QL for the precalculation of the path through the reward mechanism of agent.
The transmission failure recovery mechanism is proposed in this work for increasing the network life time.
For optimal results, feasible regions are calculated in this work.
Comparative analysis is carried out with state of the art routing techniques. In result, the proposed mechanism
increases the PDR by 11% and improves 25% energy tax.
The remaining paper is organized as follows. An extensive literature review is provided in Section2. QL is discussed
in Section3 along with its adoption in the proposed routing protocols. Moreover, the performance of the protocols
is evaluated along with their comparison with the conventional protocols in Section 4. In the end, the conclusion is
presented in Section 5. The acronyms and list of symbols used in equations shown in Table 1 and 2, respectively.
First Author et al.: Preprint submitted to Elsevier Page 3 of 18
2. Related work
Several existing routing protocols are reviewed in this section. The protocols are proposed to stabilize the energy
consumption and increase the network lifespan by preventing void hole occurrence. In [14], the authors propose an
algorithm that uses distributed load mechanism to avoid the creation of an energy hole. In the algorithm, all nodes
continuously transmit the data towards the nodes within their transmission range. The regular data transmission
generates high data load on nodes, which is reduced using the forwarding policy. The policy selects one FN to balance
the data load, which results in enhancing the network lifespan. Energy balancing increases the lifespan of the network
and prevents the occurrence of an energy hole. Furthermore, in [15], an approach is proposed to balance the energy
of the nodes in UASNs. In this approach, every node modifies its transmission mode based on the energy level status.
With this approach, energy balancing is achieved.
In [17], the authors propose a relative distance based forwarding algorithm. To reduce energy consumption and
network latency, FNs are selected through the fitness function. Thus, it results in restricting the number of nodes
involved in the transmission process. The involvement of a fewer number of hops reduces the energy utilization as well
as increases the network performance with low delay and maximum throughput.
The primary objective of the routing protocols presented in [14], [15], [16] is to maximize the network lifetime by
reducing the energy consumption of the nodes. Therefore, in the proposed algorithms, immutable selection of forwarder
nodes is performed using two parameters: energy of forwarder and total energy of neighbor nodes. Moreover, in case
of void node occurrence, an adjacent node selection is adopted to find an alternate routing for successful network
operation, which is different from [18], [19] and [20].
The authors in [21] propose an adaptive Q-learning algorithm that uses a reinforcement model for packet forwarding
to handle the acoustic environment. The algorithm uses the shortest path for transmitting the data towards destination
node. The energy efficiency is achieved by equally distributing the data between the nodes. However, the proposed
model makes local optimum decisions that increases the occurrence of void hole in the network.
A QL based kinematics and sweeping (QLKS) algorithm is proposed to manage the information table for making
the decisions [22]. The QLKS algorithm is a proactive protocol and it handles the nodes’ location and velocities
information, which is placed in the Q-table. Moreover, the topology for QLKS is dynamic and it only solves the
problems associated with the energy hole. An adaptive machine learning (ML) technique, known as QELAR [13]
chooses an FN according to the reward function, which is computed using the sum of energy of all neighboring
nodes and the residual energy of an individual node. Due to the harsh and uncertain behavior of underwater wireless
sensor networks (UWSNs), there is a need for secure and efficient data relaying techniques that enable the internet of
underwater things (IoUTs) devices to select reliable neighbor node for data forwarding. Basagni et al. [23] propose a
multi-model reinforcement learning based routing with the quality of service (MARLIN-Q) for ensuring robust and
reliable data forwarding. Fast delivery of the data is also ensured in the model for the local network. Saeed et al.
[24] propose a model in which a conjugate gradient technique is used for the unconstrained localization problem. In
numerical results, it is shown that how different network parameters affect the transmission range, divergence angle
and density of the nodes. Moreover, in the previous studies, there are no guaranteed congestion control mechanisms
that can provide optimal performance which is required in hop-to-hop communication.
Jin et al. [25] propose a reinforcement learning based congestion avoided routing (RCAR) protocol, which makes
the communication fast and reduces the energy dissipation. The convergence of routing protocol is achieved through a
virtual routing pipe in which the average residual energy of the neighboring node is presented. Handshake based MAC
protocol is integrated to avoid collision. For enhancing the communication efficiency, the authors use a cooperative
communication model [26] to provide better transmission. Convex optimization is used to overcome the power factor
of the network and to achieve maximal power control parameters for the SN and the underwater relay node. Moreover,
a strategy is used to achieve a higher system performance as well as a low chance of failure and fast convergence. The
relay scheme based on QL is applied to UWSNs to achieve fast learning. A high degree of spatial reuse process can
be accomplished, due to which the number of anchor nodes can be rapidly reduced. However, asynchronous clock,
stratification effect and combined mobility are not considered. In [27], the authors propose a Reinforcement learning
based localization algorithm to optimize the localization accuracy. The ray and mobility compensation techniques are
used to find the positions of autonomous underwater vehicles (AUVs). A reinforcement learning based MAC protocol is
designed for UASNs [28] to achieve collision-free environment for data transmission. It shows high effectiveness during
the environmental changes and achieves low end-to-end delay compared to the terrestrial environment. Similarly, the
authors in [29] achieves a high packet delivery ratio (PDR), low control head and energy consumption by incorporating
First Author et al.: Preprint submitted to Elsevier Page 4 of 18
reinforcement learning in the proposed model. The main objective of the proposed model is to optimize the network
performance through a collision-free environment. The authors use slotted carrier sensing multiple access protocol
and reinforcement learning model in [30]. The proposed work tunes the parameters and adapts them to the underwater
environment to improve the network performance.
2.1. Transmission and Delay
The high propagation delay and low communication range in the UWSNs affect the communication capability of
the nodes in the network [31]. Therefore, to reduce the delay, Valerio Di et al. propose a channel-aware reinforcement
learning based multi-path adaptive routing which is specialized for the underwater environment. It adaptively selects
the energy-efficient path to enhance the energy consumption of the sensor nodes. It increases the network lifetime.
Aiming to overcome the interference in underwater communication, Wang et al. [32] uses reinforcement learning to
design an intelligent multi-agent network. It provides distributed resource allocation mechanism that uses cooperative
QL to reduce the overhead of the resource allocation. The proposed mechanism uses the reward function value to
maximize the network transmission rate, keeping the limited communication range of nodes in mind. However, the
optimum relay strategy used in the proposed mechanism is too slow in dynamic UWSNs. The authors in [33] propose
a scheme to enhance anti-jamming communication efficiency and maximizes the relay power without depending on
the channel and jamming model. To maximize the relay utility, the learning parameters are initialized, which allow the
use of the previous anti-jamming relay. The scheme outperforms the previous schemes in terms of energy usage.
2.2. Energy
Inefficient energy consumption is an important issue in underwater scenarios as it hinders communication in the
network. Xinge Li et al. [34] propose a reinforcement learning mechanism for establishing a reliable path towards the
destination. It enables the agents to learn from the environment and makes the routing policies. The authors present
a distributed multi-agent reinforcement learning in which the distributed nature of nodes helps to forward the data.
Moreover, the reward function is presented, which consists of energy, link stability and multi-hop data delivery. Aiming
to reduce the workload of the nodes, the authors propose an intelligent localization-free mechanism based on QL [35].
The objective of the Q-learning mechanism is to achieve a high network lifetime with a low end-to-end delay. The
proposed scheme constructs optimal policies based on dual reward value, which consists of the depth and energy of
nodes to achieve the objectives. Moreover, to increase the efficiency of the network, a new holding time mechanism is
designed that performs the transmission of data based on the priority level of the nodes. Besides, it reduces unnecessary
data transmission through suppression to ensure a high packet delivery ratio. An efficient underwater communication
model is adopted in [36] that considers an intelligent reinforcement learning model to improve BER and minimize the
energy consumption of the network. The proposed scheme divides the network into two states: virtual learning and
online learning. In the first state, several experiments are conducted on the sea environment to complete the Q-table.
In the second state, the transmitter chooses its action to maximize the Q-value. The proposed scheme considers an
underwater transmitter to jointly optimize modulation and coding policy without knowing the underwater channel
model. Thus, it increases the network performance as well as reduces the packet transmission time. In [37], a multi-
objective routing protocol is proposed for UWSN inspired by biological evolution. The reliable and energy-aware
data collection in the UWSN based applications is achieved using the evolutionary mechanisms of the multi-objective
genetic algorithm. The authors in [38] proposed a data-gathering scheme based on the location prediction of AUV to
solve the unreliable energy consumption. Nodes nearer to the path of AUV suffer from a âĂIJhot regionâĂİ problem
and utilize their energy faster. A mechanism is introduced to adjust the trajectory of AUV regularly. Moreover, reliable
data transmission is achieved using a reliable time mechanism. Nodes near the AUV have sufficient time to transmit
data towards it. Magnetic induction facilitates 3D UWSN by ensuring constant behavior of the transmission channel,
small transmission delay and long communication range. However, a routing protocol that ensures prolonged network
lifetime and promising transmission delay in 3D UWSN is still not considered in the previous studies. The authors in
[39] propose a routing protocol based on reinforcement learning that enhances the lifetime and transmission delay of
the network. Two performance metrics: energy and distance, are used to establish a relationship between the nodes in
3D UWSN. Moreover, a regularity factor is used to make a relationship between the performance parameters. In this
way, the lifetime of the network is prolonged by adjusting the performance parameters. However, the environment of
UWSN faces many challenges due to its complexity, lack of consistency, high energy consumption and dynamic nature.
The authors in [40] propose a deep Q-network based energy and latency-aware routing protocol (DQELR), which takes
the optimal decision about routing that results in prolonging the life of the overall network. It uses two methods: on
First Author et al.: Preprint submitted to Elsevier Page 5 of 18
policy and off policy to make decisions about routing at various stages of communication. These communication
stages depend upon different conditions of the network. The neural network is trained online and offline in on-policy
and off-policy methods, respectively. The policy method calculates the Q-value of the nodes as it selects suitable FNs
based on this value. Moreover, the energy and latency of these selected forwarders are also continuously monitored.
The proposed protocol ensures less energy consumption and low latency to prolong the network lifetime. The authors
in [41] propose the deep learning-high dynamic biased track (DL-HDBT). On the one hand, it uses a deep learning
technique to identify the best FNs in the network. On the other hand, a hybrid dynamic biased tracking algorithm is
used to track the highly congested nodes in the network.
2.3. Security and Trust
The conventional security and trust approaches are not appropriate for the UASNs scenarios due to the harsh
environment. In [42], the authors propose a mechanism to update the trust using reinforcement learning. A key degree
mechanism is proposed to protect the most important nodes against malicious attacks by increasing their sensing
capabilities. Trust score is used in the decision-making process of the trust updating mechanism [43]. Due to the
dynamic nature and sometimes unknown topology of the UASN, the sensor nodes in the network are more prone to
attacks and it is not easy to select the optimal FN. Moreover, these networks are not flexible due to unknown topology.
The authors in [44] propose a bandit learning model, which analyzes node attacks in real-time. Moreover, learning
information of attacks is improved by introducing a feedback mechanism. A virtual expert is also introduced with a
bandit algorithm, which leads to the revelation of actual and virtual targets by the malicious nodes.
The comparison of the related work is shown in Table 3 for better understanding.
3. Proposed routing protocol
In this section, the proposed QL based routing protocol is described in detail. The strategy to maximize the network
lifespan is also presented. Moreover, the QL based algorithm is utilized to alleviate the issue of a void hole in the
Table 3: State of the Art Work.
Problems Addressed Contributions Simulation
Lack of secure and efficient
data relaying [23].
MARLIN-Q is proposed to find
trustworthy relay nodes.
NS2. ML and Artificial intelli-
gence (AI) for network per-
Short-range wireless links
and harsh environment of the
water resists the connectivity
Conjugate gradient technique for
reliable connectivity.
Not Specified Extendable to 3D space.
Congestion in network,
multi-hop Communication
and energy constrained [25].
RCAR protocol and virtual rout-
ing pipe for the convergence.
Not Specified. Efficiency could be
increased using different
ML and AI.
Limited communication
coverage and power scarcity
Reinforcement learning based re-
lay selection strategy to enhance
power control.
Real scenario im-
High communication and
computation overhead.
No anchor node detection,
asynchronous clock, stratifi-
cation effect [27].
Reinforcement learning for op-
timizing localization of the net-
Matlab. High implementation cost
Unreliable quality of com-
munication and high level of
interference [32].
A QL framework is proposed to
maximize the transmission rate of
Matlab. No consideration on optimal
resource allocation for each
Vulnerable to jamming at-
tack [33].
Reinforcement learning based
jamming scheme to enhance
communication efficiency.
Real scenario im-
Computational overhead is
First Author et al.: Preprint submitted to Elsevier Page 6 of 18
Limited communication
range, inefficient energy
consumption [34].
Selection of optimal paths using
Reinforcement learning.
Python. ML and DL can be used to
make the results more effi-
High energy consumption
and latency [35].
A QL model is proposed to in-
crease the performance of the net-
Matlab. Network lifetime is slightly
less in sparse case.
Efficiency of the network is
compromised due to high
BER [36].
A reinforcement learning model
to increase the efficiency of the
Real scenario im-
Higher cost is involved in
network deployment in real
Low quality data gather-
ing and energy consumption
Reliable and energy aware data
collection using evolutionary
MATLAB. The effect of mobile node in
UWSNs are not considered.
Low transmission rate, high
energy consumption and
propagation delay, harsh
environment [38].
Reliable data transmission
through efficient data gathering
Python. High average delay.
Network lifetime and
promising transmission
Routing protocol based on the re-
inforcement learning is proposed.
Not specified Adaptive networks to tackle
with emergencies.
Sparse deployment[41]. DL-HDBT is proposed to im-
prove the network performance.
NS2. Apply proposed work on
more dynamic and uncertain
Security and trust
approaches are not
appropriate for UASNs
scenarios [42].
A key degree mechanism to
protect higher importance nodes
from malicious attacks.
MATLAB. Low energy efficiency when
network is sparse.
Low communication range
causes hinders communica-
tion [43].
Reinforcement learning for re-
ducing the communication fail-
Bellhop ray,
WOSS interface.
More route learning tech-
niques for better results.
Unknown topology makes
the network more prone to
attack [44].
Bandit learning model analyzes
node attacks at real time.
MAB algorithms,
VEEL and Exp3.
Network attacks and the
communication of network
will be optimized.
3.1. Strategy for the proposed QL routing protocol
In the proposed strategy, we consider an aquatic environment where the distributed decisions of all agents optimize
the network performance of QL. Besides, agents’ performance is measured based on their computed rewards, which
means that the agents seek to find best actions based on the immediate rewards. In this paper, QL has a set of states,
nodes 𝑆, set of actions 𝐴, immediate expected rewards 𝑅𝑎
𝑠, 𝑠, action-state pair probability 𝑃𝑎
𝑠, 𝑠and discount factor
𝛾, which lies between 0 and 1. In this paper, each state represents the packet a node has, whereas each node’ s action
defines the transition from one state to another state while receiving the packet. For instance, if node sseeks to find
the best action ato send packet to the adjacent node 𝑠along with 𝑃𝑎
𝑠, 𝑠, then 𝑅𝑎
𝑠, 𝑠is given to s. To that end, 𝑅𝑎
𝑠, 𝑠is
important in influencing the optimal decisions of s[13], [22].
3.2. Evaluation of the reward function
In this subsection, the reward function is analyzed and before the analysis, the configuration of the proposed protocol
is discussed, as shown in Fig. 1. As mentioned previously, 𝑅𝑎𝑠
𝑝𝑠, 𝑝𝑟can also influence the behavior of each agent in QL
routing. Moreover, the aim of using QL algorithm in QL routing protocol is to maximize 𝑅𝑎𝑠
𝑝𝑠, 𝑝𝑟, which eventually
maximizes the network lifespan via a balanced energy consumption. In this study, reward is calculated by taking sum
of data packets that are successfully transmitted and the data packets that are not successfully transmitted. For instance,
if the data packet is sent by the source, denoted by 𝑝𝑠at destination node, represented by 𝑝𝑟along with the action-state
First Author et al.: Preprint submitted to Elsevier Page 7 of 18
Direct Transmission
Multi-hop Tranmission
Figure 1: Transmission range of a network and node
pair (𝑝𝑠, 𝑎𝑠), then the reward 𝑅𝑎𝑠
𝑝𝑠, 𝑝𝑟is defined as:
𝑝𝑠, 𝑝𝑟= −𝑥𝛼1[𝑐(𝑝𝑠) + 𝑐(𝑝𝑟)] + 𝛼2[𝑔(𝑝𝑠) + 𝑔(𝑝𝑟)].(1)
Two rewards are provided for successful transmission of data. The first reward in eq. 1 is used to check whether the
nodes have enough cost to transfer the data towards the destination. In case of a failure i.e, if 𝑝𝑠does not sent the packet
successfully to 𝑝𝑟, then the reward 𝑅𝑎𝑠
𝑝𝑠, 𝑝𝑟is calculated as:
𝑝𝑠, 𝑝𝑠= −𝑥×𝜂𝛽1𝑐(𝑝𝑠) + 𝛽2𝑔(𝑝𝑠).(2)
The second reward in eq. 2 checks the failures and transfer the data if a maximum reward is achieved. In this paper,
if acoustic channel is used for transmitting data, then there are chances that the transmission can be affected by:
environmental noise, network congestion, etc. Besides, the factors come with their respective cost. To that end, this
paper considers the cost as a punishment, denoted by 𝑥, which has a value of -1. In eq. 1 and 2, 𝛼1,𝛼2,𝛽1and 𝛽2are the
weighting parameters, which are used to tune the reward function of QL algorithm where 𝛼1and 𝛽1are 0.5, whereas 𝛼2
and 𝛽2are 0.05. Also, 𝜂is used to identify failure and is given a value, greater than 1 [22]. Besides, the energy related
terms are represented by 𝑐and 𝑔where 𝑐is the fraction of used energy and is defined as:
𝑐(𝑝𝑠) = 1 − 𝐸𝑟(𝑝𝑠)
The 𝑐(𝑝𝑠)is the normalized value of residual energy 𝐸𝑟and the initial energy 𝐸𝑜which ranges from 0, 1. Based
on eq. 3, the next node becomes useful to transmit packet to the destination node when there is low cost factor 𝑐(𝑝𝑠)
between the source and destination nodes. It means that the data packet is dropped if 𝑐(𝑝𝑠)is high. Conversely, a group
of adjacent FNs’ reward, denoted by 𝑔(𝑝𝑠)has a direct influence on the available connection with 𝑝𝑠.
𝑔(𝑝𝑠) = 2
𝜋arct an(𝐸𝑟(𝑝𝑠) − 𝐸(𝑝𝑠).(4)
In eq. 4, the group average residual energy is represented by 𝐸. Note that if the difference between 𝐸and 𝐸of a node
is maximum, the node becomes a reliable FN. Also, the outputs of 𝑐(𝑝𝑠)and 𝑔(𝑝𝑠)lie between -1 and 1. To that end,
First Author et al.: Preprint submitted to Elsevier Page 8 of 18
the weighting parameters for the number of hops that transmit a packet from source to destination are defined [13].
Besides, the total reward based on the proposed QL routing is defined as follows:
𝑅𝑒𝑤𝑎𝑟𝑑 =𝑃𝑠×𝑅𝑎𝑠
𝑝𝑠, 𝑝𝑟+𝑃𝑓×𝑅𝑎𝑠
𝑝𝑠, 𝑝𝑠.(5)
In this protocol, the total number of reliable FNs is denoted by 𝑁and the probability 𝑃of each node is defined as
𝑁. In eq. 5, the probability of data packet sent successfully is denoted by 𝑃𝑠, whereas the probability of data
packet not sent successfully is defined as 𝑃𝑓= 1 − 𝑃𝑠. Similar to the existing QL algorithm, the proposed QL protocol
has a Q-value function that finds the best action based on the information of its state Q(s, a). Moreover, the Q-value is
updated by the interaction of both the source and subsequent relay nodes during the network operations. It means that
Q-value is updated when there is a data packet reception while the maximum reward of the next relay node is obtained.
If the Q-value is updated, it is stored in a table and is later used for decision-making about data transmission and the
maximum reward to prolong the network’s lifespan.
𝑄(𝑠, 𝑎) = 𝑅𝑒𝑤𝑎𝑟𝑑 +𝛾[𝑄(𝑠, 𝑎) + 𝑀𝑎𝑥𝑎(𝑄(𝑠,∶))].(6)
From eq. 6, the 𝑄value is calculated when an action 𝑎is performed at a state 𝑠. The 𝑄value is based on the reward
associated with performing an action at a current state. The first reward is the actual reward provided by performing
an action while the second one is the future reward. Based on the selection process of the proposed QL protocol, data
transmitted or received within the network depends on upon the autonomous decisions of all FNs. If proper decisions are
made adaptively, the network operations become automatic. It means that there is a balance between energy dissipation
and data load on intermediate nodes for all adjacent nodes (ADNs). Therefore, a balanced transmission load enables
the network to achieve a maximum lifespan via balanced energy consumption.
3.3. Transmission failure detection
In a dynamic acoustic environment, the successful transmission of data packet is not always possible due to signal
attenuation, environmental noise, multi-path fading, etc. The most feasible solution is to get an acknowledgment (𝐴𝑐𝑘)
from the receiver node that ensures the data packet’s successful transmission. However, the earlier said solution could
lead to a very high communication overhead resulting in a drastic decrease in the algorithm’s performance.
Let us consider the scenario depicted in Fig. 2, to understand the procedure of transmission failure detection in
the proposed work. When an SN wants to transmit data to the destination node, it initiates communication with its
neighbor nodes to relay its data. Initially, a reward is computed for making the transition from SN to FN using eq. 1
and eq. 4, which are used to find out the reward of the group of nodes. SN chooses FN to deliver the data with minimum
energy based on the calculated reward, which is defined as a maximum reward. Moreover, the rewards are stored for
the time being in the buffer of the SN for later use in case of transmission failure. The probability 𝑃𝑆𝑁 of successful
transmission from SN is calculated by 𝑃=1
𝐹 𝑁𝑛
, as given in Fig. 2. After receiving the data packet, the FN repeats the
same procedure; if it has neighbor nodes, the 𝐴𝑐𝑘𝐹 𝑁 ,𝑆𝑁 is sent to SN. In a case, if the sender node does not receive
the 𝐴𝑐𝑘, the transmission failure is detected and the data packet is re-transmitted from an alternate 𝐹 𝑁 .
In Fig. 2, a transmission mechanism is presented. After failing to receive the 𝐴𝑐𝑘 from the FN, the SN selects
𝐹 𝑁1with probability 𝑃𝐴
𝑆𝑁 ,𝐹 𝑁1.𝐹 𝑁1finds a neighbor node and sends back the 𝐴𝑐𝑘𝐹 𝑁1,𝑆 𝑁 to 𝑆𝑁 . The successful
transmission is declared and memory of the buffer is released. As it is evident from Fig. 2, the process is repeated
until data is successfully transmitted at the destination. FN’s are always rotated based on the reward during the data
transmission. The reward, which is calculated for complete routing between the sender and receiver, decreases after each
transmission. Thus, immutable forwarder selection is also avoided during transmission failure detection and recovery
3.4. Selecting adjacent neighbor nodes
The data transmission between distant nodes (source and destination) leads to increase the energy consumption,
as more energy and transmission power is required for transmission of data packets. High energy consumption causes
quick battery depletion of nodes, which further leads to nodes’ sudden death. Moreover, the link between nodes is also
failed due to nodes’ death or change in the position of nodes. Because of the water current, the frequent changes in the
position of nodes causes void hole occurrence in the network. To overcome this issue, a mechanism is proposed in this
work to select trained agents that are in the transmission range (𝑇𝑟𝑎𝑛𝑔𝑒) of the source agent.
First Author et al.: Preprint submitted to Elsevier Page 9 of 18
FN void DN
Figure 2: Transmission failure recovery
Corona 3Corona 4
Subcorona 1
Subcorona 2
Not an eligible link
Eligible link
Base station
Corona 2
Corona 1
Figure 3: Selection of ADN
Fig. 3 shows the multi-hop transmission between the nodes of corona3 and corona2. As shown in the figure, the
data packet transmission is done hierarchically. Corona3 contains four zones that include 𝑍31 − 𝑎,𝑍31 − 𝑏,𝑍31 − 𝑐
and 𝑍31 − 𝑑. The zones have nodes: 𝑎,𝑏,𝑐and 𝑑. Whereas node 𝑓is located in the zone 𝑍21 − 𝑑in corona2. When
node 𝑏intends to send the data packets to the destination node, then the node has to select a relay node with maximum
reward. Moreover, node 𝑏sends the information about FN to the upper-level corona nodes, such as 𝑝and 𝑞. After
getting the data packets from the nodes 𝑝and 𝑞, node 𝑏selects the zone 𝑍21 − 𝑎to transmit the data, as this zone
First Author et al.: Preprint submitted to Elsevier Page 10 of 18
Table 4
Simulations Parameters
Notation Description
Number of Nodes 100
Initial Energy of
0.5 J
Network Radius 100-1000 m
Number of Sink 1
Number of Iterations 1000
shows more energy consumption reward. The zone 𝑍21 − 𝑎is void; therefore, node 𝑏selects nodes in other zones,
such as 𝑍31 − 𝑏and 𝑍31 − 𝑑for successful data transmission to the sink node without data packet loss.
It is evident from Fig. 3 that node 𝑐in the zone 𝑍31 − 𝑏does not contain any FN to deliver the data. If node 𝑏
selects node 𝑐, its reward is reduced. Thus, node 𝑏selects node 𝑓from the zone 𝑍31 − 𝑑to increase its reward. Node
𝑓has more FNs nearest to the sink node. It means that the zone 𝑍31 − 𝑑is optimal and its nodes can forward the data
packets to nodes in zone 𝑍21 − 𝑑with maximum reward. In such a way, QL helps to increase the network lifetime by
achieving a maximum reward.
Furthermore, automatic switching also occurs among relay nodes based on the reward. For instance, 𝑍41 − 𝑑
has node 𝑡, which has two relay nodes 𝑎and 𝑑. Both nodes compute their reward based on the residual energy using
equation eq.4. Initially, node 𝑎is preferred because the receiver reward value is higher as it has less distance from
the destination. However, as the network operations proceed, the 𝐸𝑟(𝑝𝑠)reduces, which also decreases the reward of
node 𝑎. Since node 𝑑has more 𝐸𝑟(𝑝𝑠)as compared to node 𝑎after few iterations; therefore, the immutable forwarder
selection is avoided to stabilize the energy consumption and maximize the lifespan of the network. In a case, if two
nodes get the same reward, the decision is made by considering their distance from the destination for declaring the
best FN to gain maximum reward during data transmission. Still, if the receiver reward from eq. 4 is the same and
the distance is also the same, the number of neighbor nodes is taken into consideration to decide on the FN selection.
The benefit of this feature is to allow the transmitter node to select the receiver node from the zone, which is not void.
Moreover, the receiver node must have nodes in adjacent zones. In the worst scenario, if all conditions are fulfilled and
still void node occurs, then a node from the adjacent zones acts as a helper node (HN). This enables the SN to find the
alternate route via adjacent zones that can successfully deliver the data packet to the destination node. For example, 𝑏
selects 𝑑as HN because 𝑎and 𝑑give the same reward of -1. However, with the passage of time, 𝑏selects the HN from
the ADNs, which give the maximum reward and this reward is calculated with the help of eq. 5. The reward value of
each node is maintained through eq. 6 to make more informed decisions regarding FN selection. It ensures balanced
energy consumption and data load on FNs by avoiding immutable FN selection.
4. Performance evaluation
The performance of the proposed schemes: QL-EBDG, QL-EEBDG, QL-EBDG-ADN and QL-EEBDG-AND is
compared with the standard protocols: EBDG, EEBDG and QELAR. For the simulation, we use the same network
configuration for all protocols, which are given in Table 4. At the center of the network region, a static sink is installed
to obtain the data from sensor nodes. Furthermore, to assess the behavior of the above-mentioned protocols, four
performance metrics are utilized: network lifetime, network stability period, energy tax and throughput.
4.1. Comparison between proposed and traditional routing algorithms
Effect of changing radii on the energy tax: In Fig. 4, the performance of EBDG, QL-EBDG and QL-EBDG-
AND schemes is shown. Moreover, the effects of the energy utilization of different network nodes by varying
network radii are depicted. The energy consumption increases with increasing the distance between source and
destination. Thus, the energy of the nodes depletes more quickly when the radii are high as compared to the low
radii network. Moreover, the results show that the amount of energy tax increases linearly. This happens due to
the reward computation of the node at each hop through eq. 1, which ensures the forwarder relays the data packet
with a maximum reward value. Also, the immutable selection of the FN is prevented using eq. 3. Furthermore,
we have employed the ADN selection mechanism for void node recovery. The occurrence of the void node in
the network increases the energy consumption of the nodes. Because the neighbors of the void node increase
First Author et al.: Preprint submitted to Elsevier Page 11 of 18
Network radius (m)
100 200 300 400 500 600 700 800 900 1000
Energy tax (J)
Figure 4: Energy tax at various radii.
their transmission power for data transmission. The ADN selection mechanism finds an alternate route for data
transmission after the occurrence of a void hole. Thus, it reduces the energy consumption of the nodes. The
alternate route is selected using the reward of all nodes in the network. This reward is dynamically calculated for
all nodes to avoid the further occurrence of the void hole in the network during the recovery of the data packet.
Thus ensuring low energy while routing the data from source to destination. In the proposed QL-EEBDG-AND
scheme, the QL mechanism helps it to outperforms the conventional baseline schemes. However, the proposed
scheme choose a long route to transmit data packet to the next hop, whereas the schemes without QL mechanism
have low energy dissipation. In Fig. 4, it is depicted that the proposed QL-EEBDG-AND scheme has low energy
tax as compared to the EEBDG and QL-EEBDG, respectively. The reason is that the EEBDG is a conventional
baseline protocol with the problem of the void hole. Notably, the void hole problem occurs in the conventional
schemes due to the immutable selection of the forwarder, which causes node’ s battery depletion and resulting
in wastage of energy. At the radius of 700m, the energy consumption of EBDG is higher than the QL-EBDG
and QL-EBDG-ADN, because of the traditional routing, which lacks the element of adaptability in case of
transmission failure. Moreover, the energy consumption of QL-EBDG-ADN is higher than QL-EBDG because
when void hole occurs, QL-EBDG-ADN uses the alternate path, which takes the extra hops in recovering void
hole problem. For instance, when the SN sends data packet to ADN, it further finds an HN to relay data or directly
delivers the data towards the sink. The involvement of ADN and HN increases the number of hops, leading to
more energy consumption compared to other proposed schemes. Therefore, the energy tax of scheme QL-EBDG
is less than the QL-EBDG-ADN. On the other hand, energy tax in scheme QL-EEBDG-ADN is lower than the
schemes: EEBDG and QL-EEBDG, because of better void node recovery procedure.
Effect of changing radii on lifetime of the network: Fig. 4 and 5 show that the network lifetime and energy
consumption are inversely proportional to each other. As shown in Fig. 5, with the increase in the network radii,
the lifetime of the network is reduced. Because higher network radii increase the distance between nodes and the
probability of void hole, the energy consumption of nodes increases. Due to high energy consumption, the failure
of nodes increases, which decreases the lifetime of the network. It is also observed that the lifespan of EBDG,
QL-EBDG and QL-EBDG-ADN is minimum as compared to EEBDG, QL-EEBDG and QL-EEBDG-ADN,
because direct transmission is performed on the larger radii. Hence, the use of large radii is restrained in EEBDG,
QL-EEBDG and QL-EEBDG-ADN using the reward function given in eq. 5. In QL-EEBDG, the immutable
selection of FNs with the maximum reward increases the lifetime of the network. The reward calculated using
eq. 5 is constantly updated by making autonomous forwarder decision using eq. 6, as shown in Fig. 5. Moreover,
the reward in QL-EEBDG-ADN is calculated according to the transmission mode, i.e., direct and multi-hop. In
direct transmission, if the energy consumption is high, then it is reduced using multi-hop. In QL-EBDG-ADN
and QL-EEBDG-ADN schemes, the network’s performance is improved by the reward calculation and update the
processes. Therefore, it can be observed in QL-EEBDG-ADN that the network’s operational time is maximized
by dynamically exploiting the probability of mixed transmission.
First Author et al.: Preprint submitted to Elsevier Page 12 of 18
Network radius (m)
100 200 300 400 500 600 700 800 900 1000
Lifetime of a network (rounds)
Figure 5: lifetime of the network at various radii.
Although QL-EBDG-ADN and QL-EEBDG-ADN schemes improve the network’s performance, their lifespan
is shorter than EBDG and QL-EBDG. It is because direct transmission as multi-hop transmission can not be
performed due to the absence of FNs. Moreover, the energy consumption is also high indirect transmission
because the nodes (source and destination) are too distant from the destination and die due to quick battery
drainage. Thus, the network lifetime is minimum in QL-EBDG-ADN. Moreover, the lifespan of the network of
QL-EEBDG-ADN is increased by integrating ADN, as it helps to overcome the void hole problem.
Effect of changing radii on the network stability period:In Fig. 6, the impact of varying radii on the stability
of the network is shown. The nodes away from the sink deliver the data towards it. The stability of the network
is defined regarding energy consumption. If energy is balanced, the stability of the network is high. On the other
hand, an unbalanced energy consumption affects the network stability due to the premature death of the nodes.
The death of the nodes occurs due to high data traffic of the farthest nodes on the sink nodes. According to Fig.
6, when the radius of the network is small, then its stability is high in all protocols because the less the energy is
consumed for less distance. The less distance between the nodes increases the multi-hop transmission because
nodes consume less energy in that phase. The transmission of data through the multi-hop fashion saves the energy
of the nodes for a longer time. Therefore, the stability of the network is high when the communication mode
is multi-hop. At large network radii, the stability period of QL-EEBDG-ADN is high then the other schemes,
Network radius (m)
100 200 300 400 500 600 700 800 900 1000
Network stability period
Figure 6: Network stability period at various radii.
as shown in Fig. 6. QL-EEBDG-ADN has one extra feature, namely void hole avoidance that does not exist
in EEBDG and QL-EEBDG. This feature helps to successfully increase the stability period of the network
First Author et al.: Preprint submitted to Elsevier Page 13 of 18
as compared to the other proposed protocols. In addition, by learning from the environment and through the
selection of efficient FN, the reward is measured through energy dissipation. Thus, QL-EEBDG and QL-EBDG
outperform EEBDG and EBDG.
Effect of changing radii on total number of the data packets received:In Fig. 7, the number of data packets
received at different network radii in existing and proposed protocols are shown. The number of packets received
in the network gradually decreases with the increase in radii. This is because of low energy consumption in
smaller radii, which allows the network nodes to perform more data transmissions. Therefore, the stability of the
network is also high when the communication radius is low. Hence, nodes stay alive for the maximum period
in the network. Additionally, they communicate with each other for longer duration. As a result, the network
throughput increases as compared to the larger radii. For larger radii, the reduction in the packet received is due to
the high probability of void hole occurrence in the network, which reduces the packet received at the destination.
According to Fig. 7, the overall throughput of QL-EEBDG-ADN is better as it uses avoidance technique for void
node recovery, which does not exist in EBDG, QL-EBDG, EEBDG, QL-EEBDG. The avoidance technique is
also used in QL-EBDG-ADN; however, it uses more energy than the QL-EEBDG-ADN because it performs
direct transmissions between the distant nodes. Fig. 7 shows that the data packet received by QL-EBDG-ADN
Network radius (m)
100 200 300 400 500 600 700 800 900 1000
Packet received
Figure 7: Throughput at various radii.
is much lower than the EBDG and QL-EBDG. In QL-EEBDG-ADN, when a void hole occurs, the nodes have
an option to send the data packet through ADN. Although, the routing path increases because ADN and HN are
involved in the process and the results lead to high energy consumption. However, the data packet still reaches the
destination successfully. While in other schemes, the data packet is dropped due to no recovery mechanism, which
decreases the network throughput. As a result, the number of packets received in the proposed QL-EEBDG-ADB
are also minimized. Moreover, by applying QL adaptive strategy for forwarder selection, QL-EBDG and QL-
EEBDG are improved compared to baseline schemes. Because QL helps to select reliable FNs for the data
4.2. Comparison between QELAR and proposed schemes
Effect of changing radii on the energy tax: Change in energy consumption can be observed at various network
radii’s in Fig. 8. The increase in energy tax is evident in QL-EBDG-ADN and QL-EEBDG-ADN schemes
because of recovery mechanism for void node recovery. In the recovery mode, at least one extra hop is traversed
to deliver data at the destination successfully. QELAR shows moderate energy consumption as compared to the
proposed schemes. The QELAR has the mechanism of failure detection to avoid data loss; however, the data
packet is not recovered if a void node occurs in the network. This is the reason QELAR consumes more energy
than QL-EBDG, QL-EEBDG and QL-EBDG-ADN. Whereas, QL-EBDG and QL-EEBDG show high energy
dissipation when network radii ranges from 100m to 500m (Fig. 8). However, this behavior changes at a higher
radius because in QL-EBDG, the transmission range is static, keeping the energy controlled and QL-EEBDG
computes the optimal transmission range at each hop to avoid data losses. Thus, it helps to improve the energy
consumption throughout the lifespan of the network.
First Author et al.: Preprint submitted to Elsevier Page 14 of 18
100 200 300 400 500 600 700 800 900 1000
Network radius (m)
Energy tax (mJ)
Figure 8: Energy tax at various radii.
Effect of changing radii on PDR: The variation in packet delivery with the change network radius is illustrated
in Fig. 9. The PDR of schemes: QL-EBDG-ADN and QL-EEBDG-ADN is higher as compared to the QELAR
because they have the ability to find the alternate route using the adjacent node. Although this requires an extra
amount of energy for delivering data packets towards the destination node; however, PDR is improved in QL-
EBDG-ADN and QL-EEBDG-ADN schemes. The PDR is increasing linearly as the network radius is changed
100 200 300 400 500 600 700 800 900 1000
Network radius (m)
PDR (%)
Figure 9: PDR at various radii.
from 100m to 1000m (Fig. 9). It is because of the larger number of nodes in the transmission range. It is seen
that the success ratio of QL-EBDG is lower than the QELAR due to static transmission ranges. Although the
QL helps in balancing energy through the reward mechanism; however, if a void node occurs, the data packet is
dropped, which reduces the PDR of QL-EBDG.
Effect of changing radii on the lifetime of the network: The total lifespan of the network of QELAR is less
than all the proposed schemes which can be seen in Fig. 10. The network lifetime depends on the amount of
energy consumed. QL-EBDG-ADN and QL-EEBDG-ADN have low performance initially. The alternate path
is used to deliver the data when transmission failure occurs at the FN. However, with the increase in radii, the
success probability of relaying data increases because more nodes are available for forwarding the data packets.
Therefore, HN is not needed to find an alternate route between SN and the destination node. This enhances the
network lifetime of QL-EBDG-ADN and QL-EEBDG-ADN as compared to QELAR. Moreover, it is evident
from the results that the network lifetime is reduced when the network radius is high. With a high network radius,
the transmission power needed for delivering the data packet to the respective destination nodes also increases.
First Author et al.: Preprint submitted to Elsevier Page 15 of 18
100 200 300 400 500 600 700 800 900 1000
Network radius (m)
Network lifetime (rounds)
Figure 10: Network lifetime at various radii.
Thus, from the results, we can conclude that the increase is a radius in inversely proportionally to the network
5. Conclusion
In this work, a QL based routing protocol is proposed for autonomous routing. It refrains from selecting the same
FN for packet transmission on each iteration. The residual energy is used as the FN selection parameter. In this way,
the transmission load is distributed among all nodes as no one node is burdened. Besides, the packet transmission
mode is automatically selected by considering the distance between SN and destination node, i.e., if the distance is
less, then packet is transmitted directly. Otherwise, multi-hop. Moreover, a reward function is designed to increase
lifespan of the network. The goal of the protocol is to achieve the maximum reward, which is the ratio between the sum
of successfully transmitted packets to the destination and the sum of the packets, which are not sent. The void hole is
avoided by finding the alternate neighbor routes using ADN and HN during the packet transmission. For performance
evaluation of the proposed protocol, it is compared with three baseline protocols: EBDG, EEBDG and QELAR. The
results depict that the proposed protocol outperformed the baseline protocols in terms of network lifetime, throughput
and energy consumption. The tradeoff between network throughput and energy consumption is observed from the
simulation results where increase in network throughput increases the energy consumption. For future work, we will
use deep deterministic learning policy method to increase the network performance as it has better performance as
compared to Q-learning. It will help to find more optimal paths for reliable data delivery. Moreover, the AI techniques,
including artificial neural networks, Markov decision process, etc., will be helpful to increase the data delivery towards
the base station.
[1] Javaid, Nadeem, Mohsin Raza Jafri, Sheeraz Ahmed, Mohsin Jamil, Zahoor Ali Khan, Umar Qasim, and Saleh S. Al-Saleh. “Delay-sensitive
routing schemes for underwater acoustic sensor networks." International Journal of Distributed Sensor Networks 11, no. 3 (2015): 532676.
[2] Akyildiz, Ian F., Dario Pompili and Tommaso Melodia. “Underwater acoustic sensor networks: research challenges." Ad hoc networks, vol.
3, 2005, pp. 257-279.
[3] T. Liu, Q. Li and P. Liang, “An Energy-Balancing Clustering Approach for Gradient-based Routing in Wireless Sensor Networks", Computer
Communications, vol. 35(17), 2012, pp. 2150-2161.
[4] Jan, Naeem, Nadeem Javaid, Qaisar Javaid, Nabil Alrajeh, Masoom Alam, Zahoor Ali Khan, and Iftikhar Azim Niaz. “A balanced energy-
consuming and hole-alleviating algorithm for wireless sensor networks." IEEE Access 5 (2017): 6134-6150.
[5] Akbar, Mariam, Nadeem Javaid, Ayesha Hussain Khan, Muhammad Imran, Muhammad Shoaib, and Athanasios Vasilakos. “Efficient data
gathering in 3D linear underwater wireless sensor networks using sink mobility." Sensors 16, no. 3 (2016): 404.
[6] Umar, Amara, Nadeem Javaid, Ashfaq Ahmad, Zahoor Ali Khan, Umar Qasim, Nabil Alrajeh, and Amir Hayat. “DEADS: Depth and energy
aware dominating set based algorithm for cooperative routing along with sink mobility in underwater WSNs." Sensors 15, no. 6 (2015):
[7] Javaid, Nadeem, Naveed Ilyas, Ashfaq Ahmad, Nabil Alrajeh, Umar Qasim, Zahoor Ali Khan, Tayyaba Liaqat, and Majid Iqbal Khan. “An
efficient data-gathering routing protocol for underwater wireless sensor networks." Sensors 15, no. 11 (2015): 29149-29181.
First Author et al.: Preprint submitted to Elsevier Page 16 of 18
[8] Zhang, Haibo and Hong Shen. “Balancing energy consumption to maximize network lifetime in data-gathering sensor networks." IEEE
Transactions on Parallel and Distributed Systems, vol. 20, 2009, pp. 1526-1539.
[9] Abdul Karim, Obaida and Javaid, Nadeem and Sher, Arshad and Wadud, Zahid and Ahmed, Sheeraz (2018) “QL-EEBDG: QLearning based
energy balanced routing in underwater sensor networks." EAI Endorsed Transactions on Energy Web, 5 (17): e15. ISSN 2032-944X
[10] Ghoreyshi, S.M., Shahrabi, A., Boutaleb, T., “Void-handling techniques for routing protocols in underwater sensor networks: survey and
challenges: survey and challenges." IEEE Commun. Surv. Tutorials 19(2), 2017, pp. 800âĂŞ827.
[11] Xu, Xin, Lei Zuo and Zhenhua Huang. “Reinforcement learning algorithms with function approximation: Recent advances and applications."
Information Sciences, vol. 261, 2014, pp. 1-31.
[12] Nadeem Javaid, Obaida Abdul karim, Arshad Sher, Muhammad Imran, Ansar Haque Yasar and Mohsen Guizani, “Q-Learning for energy
balancing and avoiding the void hole routing protocol in underwater sensor networks", in 14th IEEE International Wireless Communications
and Mobile Computing Conference (IWCMC-2018).
[13] Hu, Tiansi and Yunsi Fei. “QELAR: a machine-learning-based adaptive routing protocol for energy-efficient and lifetime-extended underwater
sensor networks." IEEE Transactions on Mobile Computing, vol. 9, no. 6, 2010, pp. 796-809.
[14] Zidi, C., Bouabdallah, F. and Boutaba, R., 2016. “Routing design avoiding energy holes in underwater acoustic sensor networks." Wireless
Communications and Mobile Computing, 16(14), pp.2035-2051.
[15] Cao, Jiabao, Jinfeng Dou and Shunle Dong. “Balance transmission mechanism in underwater acoustic sensor networks." International Journal
of Distributed Sensor Networks, 2015, pp. 1-12.
[16] Chien-Fu Cheng and Lung-Hao Li. “Data gathering problem with the data impor tance consideration in Underwater Wireless Sensor Networks."
Journal of Network and Computer Applications 78, 2017, pp. 300-312.
[17] Li, Zonglin, Nianmin Yao, and Qin Gao. “Relative distance based forwarding protocol for underwater wireless networks." International Journal
of Distributed Sensor Networks 10, no. 2 (2014): 173089.
[18] Rodolfo W. L. Coutinho, Azzedine Boukerche, Luiz F. M. Vieira and Antonio A. F. Loureiro. “Geographic and opportunistic routing for
underwater sensor networks." IEEE Transactions on Computers 65, no. 2, 2016: 548-561.
[19] Chen, Y. D., Chen, Y. W., Lien, C. Y., and Shih, K. P. “A channel-aware depth-adaptive routing protocol for underwater acoustic sensor
networks." In OCEANS 2014-TAIPEI, IEEE, 2014: pp. 1-6.
[20] H. Yu, N. Yao and J. Liu, An adaptive routing protocol in underwater sparse acoustic sensor networks. Ad Hoc Networks, vol. 34, 2015, pp.
121âĂŞ 143.
[21] J.A. Boyan & M.L. Littman. “Packet routing in dynamically changing networks: a reinforcement learning approach." Advances in Neural
Information Processing Systems 6, Morgan Kaufmann, San Mateo, California, 1994, pp. 671-678.
[22] Plate, Randall and Cherry Wakayama. “Utilizing kinematics and selective sweeping in reinforcement learning-based routing algorithms for
underwater networks." Ad Hoc Networks, vol. 34, 2015, pp. 105-120.
[23] Basagni, Stefano, Valerio Di Valerio, Petrika Gjanci and Chiara Petrioli. “MARLIN-Q: Multi-modal communications for reliable and low-
latency underwater data delivery." Ad Hoc Networks 82 (2019): 134-145.
[24] Saeed, Nasir, Abdulkadir Celik, Mohamed-Slim Alouini and Tareq Y. Al-Naffouri. “Performance analysis of connectivity and localization in
multi-hop underwater optical wireless sensor networks." IEEE Transactions on Mobile Computing 18, no. 11 (2018): 2604-2615.
[25] Jin, Zhigang, Qinyi Zhao and Yishan Su. “RCAR: A reinforcement-learning-based routing protocolfor congestion-avoided underwater acoustic
sensor networks." IEEE Sensors Journal 19, no. 22 (2019): 10881-10891.
[26] Su, Yuhan, Minghui LiWang, Zhibin Gao, Lianfen Huang, Xiaojiang Du and Mohsen Guizani. “Optimal cooperative relaying and power
control for IoUT networks with reinforcement learning." IEEE Internet of Things Journal (2020).
[27] Yan, Jing, Yadi Gong, Cailian Chen, Xiaoyuan Luo and Xinping Guan. “AUV-Aided Localization for Internet of Underwater Things: A
Reinforcement Learning-based Method." IEEE Internet of Things Journal (2020).
[28] Park, Sung Hyun, Paul Daniel Mitchell and David Grace. “Reinforcement learning based MAC protocol (UW-ALOHA-Q) for underwater
acoustic sensor networks." IEEE access 7 (2019): 165531-165542.
[29] Musaddiq, Arslan, Zulqar Nain, Yazdan Ahmad Qadri, Rashid Ali, and Sung Won Kim. “Reinforcement Learning-Enabled Cross-Layer
Optimization for Low-Power and Lossy Networks under Heterogeneous Traffic Patterns." Sensors 20, no. 15 (2020): 4158.
[30] Jin, Lu and Defeng David Huang. “A slotted CSMA based reinforcement learning approach for extending the lifetime of underwater acoustic
wireless sensor networks." Computer Communications 36.9 (2013): 1094-1099.
[31] Javaid, Nadeem, Mohsin Raza Jafri, Zahoor Ali Khan, Nabil Alrajeh, Muhammad Imran, and Athanasios Vasilakos. “Chain-based communi-
cation in cylindrical underwater wireless sensor networks." Sensors 15, no. 2 (2015): 3625-3649.
[32] Wang, Hui, Youming Li and Jiangbo Qian. “Self-Adaptive Resource Allocation in Underwater Acoustic Interference Channel: A Reinforce-
ment Learning Approach." IEEE Internet of Things Journal 7, no. 4 (2019): 2816-2827.
[33] Xiao, Liang, Donghua Jiang, Ye Chen, Wei Su, and Yuliang Tang. “Reinforcement-learning-based relay mobility and power allocation for
underwater sensor networks against jamming." IEEE Journal of Oceanic Engineering 45, no. 3 (2019): 1148-1156.
[34] Li, Xinge, Xiaoya Hu, Rongqing Zhang, and Liuqing Yang. “Routing Protocol Design for Underwater Optical Wireless Sensor Networks: A
Multiagent Reinforcement Learning Approach." IEEE Internet of Things Journal 7, no. 10 (2020): 9805-9818.
[35] Zhou, Yuan, Tao Cao and Wei Xiang. “Anypath Routing Protocol Design via Q-Learning for Underwater Sensor Networks." arXiv preprint
arXiv:2002.09623 (2020).
[36] Su, Wei, Jiamin Lin, Keyu Chen, Liang Xiao and Cheng En. “Reinforcement learning-based adaptive modulation and coding for efficient
underwater communications." IEEE Access 7 (2019): 67539-67550.
[37] Faheem, Muhammad, Md Asri Ngadi and Vehbi Cagri Gungor. “Energy efficient multi-objective evolutionary routing scheme for reliable data
gathering in Internet of underwater acoustic sensor networks." Ad Hoc Networks 93 (2019): 101912.
[38] Han, Guangjie, Xiaohan Long, Chuan Zhu, Mohsen Guizani, Yuanguo Bi and Wenbo Zhang. “An AUV location prediction-based data
collection scheme for underwater wireless sensor networks." IEEE Transactions on Vehicular Technology 68, no. 6 (2019): 6037-6049.
First Author et al.: Preprint submitted to Elsevier Page 17 of 18
[39] Wang, Sai and Yoan Shin. “Efficient routing protocol based on reinforcement learning for magnetic induction underwater sensor networks."
IEEE Access 7 (2019): 82027-82037.
[40] Su, Yishan, Rong Fan, Xiaomei Fu, and Zhigang Jin. “DQELR: An adaptive deep Q-network-based energy-and latency-aware routing protocol
design for underwater acoustic sensor networks." IEEE Access 7 (2019): 9091-9104.
[41] Hemavathy, N., and P. Indumathi. “Deep learning-based hybrid dynamic biased track (DL-HDBT) routing for under water acoustic sensor
networks." Journal of Ambient Intelligence and Humanized Computing (2020): 1-15.
[42] Y. He, G. Han, J. Jiang, H. Wang and M. Martinez-Garcia, “A Trust Update Mechanism Based on Reinforcement Learning in Underwater
Acoustic Sensor Networks," in IEEE Transactions on Mobile Computing, doi: 10.1109/TMC.2020.3020313.
[43] Di Valerio, Valerio, Francesco Lo Presti, Chiara Petrioli, Luigi Picari, Daniele Spaccini and Stefano Basagni. “CARMA: Channel-aware
reinforcement learning-based multi-path adaptive routing for underwater wireless sensor networks." IEEE Journal on Selected Areas in
Communications 37, no. 11 (2019): 2634-2647.
[44] Li, Xinbin, Yi Zhou, Lei Yan, Haihong Zhao, Xiaodong Yan and Xi Luo. “Optimal Node Selection for Hybrid Attack in Underwater Acoustic
Sensor Networks: A Virtual Expert-Guided Bandit Algorithm." IEEE Sensors Journal 20, no. 3 (2019): 1679-1687.
ZAHOOR ALI KHAN (SM’15) holds different academic positions with Dalhousie and Saint Mary’ s Universities, Canada.
He is currently the Division Chair of the Computer Information Science (CIS) Division and the Applied Media Division,
Higher Colleges of Technology, United Arab Emirates. He has more than 19 years of research and development, academia,
and project management experience in IT and engineering fields. He has multidisciplinary research skills on emerging
wireless technologies. His current research interests include e-health pervasive wireless applications, the theoretical and
practical applications of wireless sensor networks, smart grids, and the Internet of Things. His research outcomes include
several journal articles, book chapters, and numerous conference proceedings, all peer-reviewed. The journal articles have
appeared in prestigious and leading journals. Most of his conference articles have been published by IEEE Xplore, Springer,
or Elsevier, and indexed by Scopus and Thomson Reuters’ Conference Proceeding Citation Index. He is an Editorial Board
Member of several prestigious journals. He also serves as a Regular Reviewer/Organizer of numerous reputed ISI indexed journals, the IEEE
conferences, and workshops. He is a Senior Member of IAENG. Several conference articles have received the Best Paper Awards (BWCCA 2012,
IEEE ITT 2017, and EIDWT-2019).
OBAIDA ABDUL KARIM received her bachelor and master’ s degree in computer science from International Islamic
University, Islamabad. She completed her thesis with the Communication over Sensors (ComSens) Research Laboratory
under the supervision of Dr. Nadeem Javaid. His research interests include Wireless Sensor Network.
SHAHID ABBAS received his bachelor’ s degree in Telecommunication & networking from Comsats Institute of
information and technology, attack campus in 2017. He is currently pursuing the MS degree in computer science with the
Communication over Sensors (ComSens) Research Laboratory under the supervision of Dr. Nadeem Javaid. His research
interests include blockchain in the Internet of Things, the Internet of Vehicular Network, and Wireless Sensor Network.
NADEEM JAVAID (S’08, M’11, SM’16) received the bachelor’ s degree in computer science from Gomal University, Dera
Ismail Khan, Pakistan, in 1995, the master’ s degree in electronics from Quaid-i-Azam University, Islamabad, Pakistan, in
1999, and the Ph.D. degree from the University of Paris-Est, France, in 2010. He is currently an Associate Professor and the
Founding Director of the Communications over Sensors (ComSens) Research Laboratory, Department of Computer Science,
COMSATS University Islamabad, Islamabad Campus. He has supervised 126 masters and 20 Ph.D. theses. He has authored
over 900 articles in technical journals and international conferences. His research interests include energy optimization in
smart/micro grids, wireless sensor networks, big data analytics in smart grids, blockchain in WSNs/smart grids, etc. He
was a recipient of the Best University Teacher Award from the Higher Education Commission of Pakistan, in 2016, and the
Research Productivity Award from the Pakistan Council for Science and Technology, in 2017. He is also Associate Editor
of IEEE Access and Editor of the International Journal of Space-Based and Situated Computing and Sustainable Cities and
YOUSAF BIN ZIKRIA (SM’17) received the Ph.D. degree from the Department of Information and Communication
Engineering, Yeungnam University, Gyeongsan, South Korea, in 2016. He is currently working as an Assistant Professor
with the Department of Information and Communication Engineering, College of Engineering, Yeungnam University. He
has more than ten years of experience in research, academia, and industry in the field of information and communication
engineering and computer science. He has authored more than 60 scientific peer-reviewed journals, conferences, patents,
and book chapters.
First Author et al.: Preprint submitted to Elsevier Page 18 of 18
USMAN TARIQ received the Ph.D. degree from Ajou University, South Korea. He is currently an Associate Professor
with the College of Computer Engineering and Sciences, PSAU. He has led the design of a global data infrastructure
simulator modeling, to evaluate the impact of competing architectures on the performance, availability, and reliability of
the systems for Industrial IoT infrastructure. His international collaborations/collaborators include NYIT, Ajou University,
PSU, University of Sherbrooke, COMSATS, NUST, UET, National Security Research Institute (NSR), Embry-Riddle
Aeronautical University, Korea University, University of Bremen, and the Virginia Commonwealth University. His current
interests are in applied cybersecurity, advanced topics in the Internet of Things, and health informatics. His research interests
include large complex networks, which include network algorithms, stochastic networks, network information theory, and
large-scale statistical inference. As a Network Security Theorist, his contributions towards addressing these challenges
involve designing better wireless access networks and processing social data and scalable algorithms that can operate in a data center like facility.
First Author et al.: Preprint submitted to Elsevier Page 19 of 18
... In [7,[65][66][67][68], the existing techniques for solving the storage management in UIoT are discussed, and some methods are indicated herewith. In [7], I.F. ...
... Akyildiz et al. suggested that underwater sensors need to perform some data caching due to the intermittent underwater channel characteristics. In [65], Zahoor Ali Khan et al. researched Q-learning (QL), comprising of reactive and proactive strategies to reduce the network overhead related to network lifetime. In [66,67] memory management, an essential function to store and retrieve information through smart sensing underwater devices, was studied to solve the challenges of the underwater network management system (U-NMS). ...
Full-text available
Owing to the hasty growth of communication technologies in the Underwater Internet of Things (UIoT), many researchers and industries focus on enhancing the existing technologies of UIoT systems for developing numerous applications such as oceanography, diver networks monitoring, deep-sea exploration and early warning systems. In a constrained UIoT environment, communication media such as acoustic, infrared (IR), visible light, radiofrequency (RF) and magnet induction (MI) are generally used to transmit information via digitally linked underwater devices. However, each medium has its technical limitations: for example, the acoustic medium has challenges such as narrow-channel bandwidth, low data rate, high cost, etc., and optical medium has challenges such as high absorption, scattering, long-distance data transmission, etc. Moreover, the malicious node can steal the underwater data by employing blackhole attacks, routing attacks, Sybil attacks, etc. Furthermore, due to heavyweight, the existing privacy and security mechanism of the terrestrial internet of things (IoT) cannot be applied directly to UIoT environment. Hence, this paper aims to provide a systematic review of recent trends, applications, communication technologies, challenges, security threats and privacy issues of UIoT system. Additionally, this paper highlights the methods of preventing the technical challenges and security attacks of the UIoT environment. Finally, this systematic review contributes much to the profit of researchers to analyze and improve the performance of services in UIoT applications.
... As can be seen from Figure 4 and Table 1 above, when t = 0:3, the numerical solution calculated by the Adomian splitting method is very consistent with the exact solution. The Adomian splitting method converges very fast and can provide high-precision approximate solution for the equation without discretization, as shown in Table 2 [25,26]. ...
Full-text available
In order to solve some shortcomings, such as the traditional integer order calculus theoretical model is in good agreement with the numerical experimental results, the fractional order calculus model in many fields such as modern engineering calculation is proposed, which has been paid more attention and applied than the integer order calculus model. In such problems, nonlinear fractional differential equations sometimes bring us many unexpected surprises, so as to get unexpected conclusions about the description of the problem. The experiment shows that when the time t = 0.5 , the error between them is 0.0305, and the error is slightly larger. In this case, we can reduce the overall error by adding a new term of the decomposition sequence, and the approximate analytical solution can be closer to the exact solution, which verifies the effectiveness of the experiment.
... The explosive growth of social activities has led to bottlenecks in computing and communication, and has also greatly affected the quality of services provided by the network and the quality of customized delivered content. How to measure social relations and use them to build high credibility of interaction between nodes plays an important role in designing high efficient and/or low-energy data forwarding models [17,18] for MEC network applications. ...
Full-text available
Given frequent changes of network topology caused by limited computing power, strong mobility and weak reliability of most nodes in mobile edge computing (MEC) networks, a Non-Cooperative Game forwarding strategy based on user Trustworthiness (NCGT) is proposed to deal with low security and efficiency of data transmission. NCGT firstly considers device residual energy ratio, contact probability, service degree and link stability between devices to measure the reliability of nodes. Then, leverages Entropy Weight (EW) method and Golden Section Ratio (GSR) to develop a security optimal neighbors screening model based on multi-attribute decision-making, which ensures that high-performance security nodes are selected as forwarding game objects. Third, NCGT takes forwarding and non-forwarding as the policy set, designs the benefit function, and gets forwarding probability of nodes through Nash equilibrium, to reduce a large number of redundancy, competition and conflict in forwarding requests and improve its broadcasting efficiency. The simulation results show, NCGT is more effective against black hole and witch attacks than S-MODEST and AODV+FDG when there exists malicious nodes. Meanwhile, with the increasing of network load, NCGT with or without GSR always performs best in the terms of data delivery rate, delay, transmission energy consumption and system throughput in MEC environment.
Underwater wireless sensor network (UWSN) is used to monitor the compactness of ocean surveillance, marine and harsh underwater environment. In this article, an efficient energy consumption and delay aware autonomous data gathering routing protocol (ADGRP) scheme based on deep learning (dl) mobile edge model (mem) and beetle antennae search algorithm (BASA) for UWSN is proposed to overcome above problems. ADGRP is used to gather more data from the underwater environment by the use of the autonomous underwater vehicle (AUV). DL‐MEM is used to increase the network life time. Then the deep learning parameters are optimized by using BAS. The objective function is “to increase the efficiency and lifetime of network by decreasing the energy consumptions and delay.” The simulation process is carried out in MATLAB site. The proposed ADGRP‐DL‐MEM‐BASA provides lower energy consumption 20.83%, 34.66%, 18.03%, 20.92%, 22.34%, lower energy drop 7.85%, 23.94%, 17.93%, 21.93%, 31.94% is compared with the existing energy‐efficient probabilistic depth‐based routing (EEPDBR‐UWSN), ordered contention MAC (OCMAC‐UWSN), Q‐learning based energy‐efficient and void avoidance routing protocol for underwater acoustic sensor networks (QL‐EEBDG‐UWSN), energy‐efficient depth‐base opportunistic routing along Q‐learning for underwater wireless sensor networks (EDORQ‐UWSN), channel‐aware reinforcement learning‐based multipath adaptive routing for underwater wireless sensor networks (CARMA‐ EE‐UWSN) respectively.
Full-text available
The next generation of the Internet of Things (IoT) networks is expected to handle a massive scale of sensor deployment with radically heterogeneous traffic applications, which leads to a congested network, calling for new mechanisms to improve network efficiency. Existing protocols are based on simple heuristics mechanisms, whereas the probability of collision is still one of the significant challenges of future IoT networks. The medium access control layer of IEEE 802.15.4 uses a distributed coordination function to determine the efficiency of accessing wireless channels in IoT networks. Similarly, the network layer uses a ranking mechanism to route the packets. The objective of this study was to intelligently utilize the cooperation of multiple communication layers in an IoT network. Recently, Q-learning (QL), a machine learning algorithm, has emerged to solve learning problems in energy and computational-constrained sensor devices. Therefore, we present a QL-based intelligent collision probability inference algorithm to optimize the performance of sensor nodes by utilizing channel collision probability and network layer ranking states with the help of an accumulated reward function. The simulation results showed that the proposed scheme achieved a higher packet reception ratio, produces significantly lower control overheads, and consumed less energy compared to current state-of-the-art mechanisms.
Full-text available
In underwater acoustic sensor networks (UASN), the main challenging issues are bandwidth, higher propagation delay, and heavy packet loss during data transmission. The issues can be solved through efficient routing algorithms. Due to the complexity and variability of the underwater acoustic environment, the underwater acoustic sensor network has the characteristics of fluidity, sparse deployment, and energy limitation, which brings certain challenges to underwater positioning technology. Aiming at the scenario that the node redundancy in the underwater acoustic sensor network leads to low positioning efficiency, this paper has proposed the deep learning-high dynamic biased track (DL-HDBT) algorithm. The DL-HDBT combines the deep learning and hybrid dynamic biased tracking algorithm. Deep learning (DL) helps in the identification of the best relay nodes in the network and traffic-congested nodes are tracked using a high dynamic bias track. The routing protocol has been implemented in the ns2-AqaSim simulator and testbed for measurement of the performance metrics of the UASN. The simulation results showed that the novel routing method throughput has increased by 17%, 35%, and 57% when compared with SUN, VBF and DF method. It can effectively improve the throughput of nodes, balance positioning performance as well as energy use efficiency, and optimize the positioning result of UWASN.
Full-text available
The demand for regular monitoring of the marine environment and ocean exploration is rapidly increasing, yet the limited bandwidth and slow propagation speed of acoustic signals leads to low data throughput for underwater networks used for such purposes. This study describes a novel approach to medium access control that engenders efficient use of an acoustic channel. ALOHA-Q is a medium access protocol designed for terrestrial radio sensor networks and reinforcement learning is incorporated into the protocol to provide efficient channel access. In principle, it potentially offers opportunities for underwater network design, due to its adaptive capability and its responsiveness to environmental changes. However, preliminary work has shown that the achievable channel utilisation is much lower in underwater environments compared with the terrestrial environment. Three improvements are proposed in this paper to address key limitations and establish a new protocol (UW-ALOHA-Q). The new protocol includes asynchronous operation to eliminate the challenges associated with time synchronisation under water, offer an increase in channel utilisation through a reduction in the number of slots per frame, and achieve collision free scheduling by incorporating a new random back-off scheme. Simulations demonstrate that UW-ALOHA-Q provides considerable benefits in terms of achievable channel utilisation, particularly when used in large scale distributed networks.
As a promising technology in the Internet of Underwater Things, underwater sensor networks (UWSNs) have drawn a widespread attention from both academia and industry. However, designing a routing protocol for UWSNs is a great challenge due to high energy consumption and large latency in the underwater environment. This article proposes a $Q$ -learning -based localization-free anypath routing (QLFR) protocol to prolong the lifetime as well as reduce the end-to-end delay for UWSNs. Aiming at optimal routing policies, the $Q$ -value is calculated by jointly considering the residual energy and depth information of sensor nodes throughout the routing process. More specifically, we define two reward functions (i.e., depth-related and energy-related rewards) for $Q$ -learning with the objective of reducing latency and extending network lifetime. In addition, a new holding time mechanism for packet forwarding is designed according to the priority of forwarding candidate nodes. Furthermore, mathematical analyses are presented to analyze the performance and computational complexity of the proposed routing protocol. Extensive simulation results demonstrate the superiority performance of the proposed routing protocol in terms of the end-to-end delay and the network lifetime.
Underwater acoustic sensor networks (UASNs) have been widely applied in marine scenarios, such as offshore exploration, auxiliary navigation, and marine military. Due to the limitations in communication, computation, and storage of underwater sensor nodes, traditional security mechanisms are not applicable to UASNs. Recently, trust model has been proved effective in improving the security of UASNs. However, the existing trust models lack flexible trust update mechanism when facing the dynamic underwater environment and changing attack modes. In this study, a novel trust update mechanism based on reinforcement learning (TUMRL) is proposed for UASNs. The scheme is divided into three phases. First, an environment model is proposed to quantify the impact of underwater environment, and the output of the environment can make decisions for trust update. Then, the definition of key degree is given. In the process of trust update, nodes with higher key degree react more sensitively to malicious attacks, thereby better protecting important nodes in the network. Finally, a novel trust update mechanism based on reinforcement learning is proposed to withstand changing attack modes while achieving efficient trust update. The experimental results prove that our proposed scheme has good performance in improving trust update efficiency and network security.
Internet of underwater things (IoUT) consists of numerous sensor nodes distributed in an underwater area for sensing, collecting, processing information, and sending related messages to the data processing center. However, the characteristics of the underwater environment will bring strict limitations on communication coverage and power scarcity to IoUT networks. Applying cooperative communications to IoUT networks can expand the communication range and alleviate power shortages. In this paper, we investigate the cooperative communication problem in a power-limited cooperative IoUT system and propose a reinforcement learning-based underwater relay selection strategy. Specifically, we first determine the optimal transmit powers of the source node and the selected underwater relay to maximize the end-to-end signal-to-noise ratio of the system. Then, we formulate the underwater cooperative relaying process as a Markov process and apply reinforcement learning to obtain an effective underwater relay selection strategy. Simulation results show that the performance of the proposed scheme outperforms that of the equal transmit power settings under the same conditions. In addition, the proposed deep Q-network-based underwater relay selection strategy improves the communication efficiency compared with the Q-learning based strategy, and the number of iterations needed for convergence can be effectively reduced.
Localization is a critical issue for many location-based applications in the Internet of Underwater Things (IoUT). Nevertheless, the asynchronous time clock, stratification effect and mobility properties of underwater environment make it much more challenging to solve the localization issue. This paper is concerned with an autonomous underwater vehicle (AUV) aided localization issue for IoUT. We first provide a hybrid network architecture that includes surface buoys, AUVs, active and passive sensor nodes. On the basis of this architecture, an asynchronous localization protocol is designed, through which the localization problem is provided to minimize the sum of all measurement errors. In order to make this problem tractable, a reinforcement learning (RL) based localization algorithm is developed to estimate the locations of AUVs, active and passive sensor nodes, where an online value iteration procedure is performed to seek the optimization locations. It is worth mentioned that, the proposed localization algorithm adopts two neural networks to approximate the increment policy and value function, and more importantly, it is much preferable for nonsmooth and nonconvex underwater localization problem due to its insensitivity to the local optimal. Performance analyses for the RL-based localization algorithm are also provided. Finally, simulation and experimental results reveal that the localization performance in this paper can be significantly improved as compared with the other works.
Underwater optical wireless sensor networks (UOWSNs) have been attracting many interests for advantages of high transmission rate, ultra-wide bandwidth, and low latency. However, due to limited energy resources and highly-dynamic topology caused by water flow movement, it is challenging to provide a low-consumption and reliable routing in UOWSNs. To tackle this issue, in this paper, we propose an efficient routing protocol based on multi-agent reinforcement learning, termed as DMARL, for UOWSNs. The network is firstly modeled as a distributed multi-agent system, and residual energy and link quality are considered into the routing protocol design to improve the adaptation to a dynamic environment and the support of prolonging network life. Additionally, two optimization strategies are proposed to accelerate the convergence of the reinforcement learning algorithm. On the basis, a reward mechanism is provided for the distributed system. Simulation results show that DMARL-based routing protocol has low energy consumption and high packet delivery ratio (over 90%), and it is suitable for networks where the average number of neighbour nodes is less than 14.
Since underwater acoustic channels are shared by multiple heterogeneous entities and can suffer from severe interference, underwater acoustic communication networks (UACNs) are faced with the challenge of mitigating interference and improving communication quality by implementing distributed resource allocation approaches. In this work, we introduce the concept of reinforced learning in intelligent control to the UACNs by treating the nodes as intelligent agents and the node networks as multi-agent networks. By partitioning the state space and the action space, we formulate a reward function and a search strategy and propose a distributed resource allocation algorithm based on cooperative Q-Learning. In addition, we verify the convergence of the proposed algorithm. Finally, simulation results in two different underwater application scenarios show that the proposed algorithm outperforms the existing algorithms in improving the network transmission capacity, and can reduce the overhead of resource allocation by using cooperative Q-Learning.
In this study, adversarial graph bandit theory is used to rapidly select the optimal attack node in underwater acoustic sensor networks (UASNs) with unknown topology. To ensure the flexibility and elusiveness of underwater attack, we propose a bandit-based hybrid attack mode that combines active jamming and passive eavesdropping. We also present a virtual expert-guided online learning algorithm to select the optimal node without priori topology information and complex calculation. The virtual expert mechanism is proposed to guide the algorithm learning. The expert establishes a virtual topology configuration, which addresses the blind exploration and energy consumption of attackers to a large extent. With the acoustic broadcast characteristic, we also put forward an expert self-updating method to follow the changes of real networks. This method enables the algorithm to commendably adapt to the dynamic environments. Simulation results verify the strong adaptability and robustness of the proposed algorithm.