Available via license: CC BY 4.0
Content may be subject to copyright.
Citation: Hisyam Ng, H.A.;
Mahmoodi, T. Machine
Learning-Driven Dynamic Traffic
Steering in 6G: A Novel Path Selection
Scheme. Big Data Cogn. Comput. 2024,
8, 172. https://doi.org/10.3390/
bdcc8120172
Academic Editor: Domenico Ursino
Received: 21 May 2024
Revised: 19 November 2024
Accepted: 25 November 2024
Published: 27 November 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Machine Learning-Driven Dynamic Traffic Steering in 6G:
A Novel Path Selection Scheme
Hibatul Azizi Hisyam Ng * and Toktam Mahmoodi *
Department of Engineering, King’s College London, London WC2R 2LS, UK
*Correspondence: hibatul.hisyam_ng@kcl.ac.uk (H.A.H.N.); toktam.mahmoodi@kcl.ac.uk (T.M.)
Abstract: Machine learning is taking on a significant role in materializing a new vision of 6G. 6G
aspires to provide more use cases, handle high-complexity tasks, and improvise the current 5G and
beyond 5G infrastructure. Artificial Intelligence (AI) and machine learning (ML) are the optimal
candidates to support and deliver these aspirations. Traffic steering functions encompass many
opportunities to help enable new use cases and improve overall performance. The emergence and
advancement of the non-terrestrial network is another driving factor for creating an intelligence
selection scheme to have a dynamic traffic steering function. With service-based architecture, 5G
and 6G are data-driven architectures that use massive transactional data to emerge a new approach
to handling highly complex processes. A highly complex process, a massive volume of data, and a
short timeframe require a scheme using machine learning techniques to resolve the challenges. In this
paper, the study creates a scheme to use the massive historical data and provide a decision scheme
that enables dynamic traffic steering functions addressing the future emergence of the heterogeneous
transport network and aligns with the Open Radio Access Network (O-RAN). The proposed scheme
in this paper gives an inference to be programmed in the telecommunication nodes. It provides a
novel scheme to enable dynamic traffic steering functions for the 6G transport network. The study
shows an appropriate data size to create a high-performance multi-output classification model that
produces more than 90% accuracy for traffic steering functions.
Keywords: machine learning; Open Radio Access Network; non-terrestrial network; QoS flow
identifier; mixed integer linear programming; optimization solver service function chaining; Virtual
Network Function; multi-output classification; random forest algorithm
1. Introduction
Machine learning is obtaining an important role in many industries. The usability
of machine learning contributes to regression, classification, clustering, dimensionality
reduction, and decision-making tasks that constitute supervised learning, unsupervised
learning, and reinforcement learning, respectively. Each machine learning is driven by
distinctive functionalities such as predictive tasks, insight extraction from the unlabeled
data, and the responses from the current state. The adoption of machine learning by
industries, specifically in the telecommunication field, is focused on lowering operating
costs, creating highly automated processes, creating new revenue streams, and enabling
faster provisioning processes (fulfilment and assurance) [
1
]. Machine learning applications
are positioned to accelerate a more critical use of resources to enable sixth generation (6G)
networks [1].
This progression in the telecommunication industry incited the growth of many appli-
cations to carry specific tasks in various flows/activities from 5G, subsequently progressing
to improve in similar ways for 6G. Traffic steering functions optimize the overall flows for
traffic traversing in the telecommunication infrastructure. This function gives bare load
balancing functionality to dynamic traffic management by re-routing the traffic based on
network conditions [
2
]. Ultimately, the traffic steering function computes the end-to-end
Big Data Cogn. Comput. 2024,8, 172. https://doi.org/10.3390/bdcc8120172 https://www.mdpi.com/journal/bdcc
Big Data Cogn. Comput. 2024,8, 172 2 of 13
path of the traffic flow to obtain the optimum route. Other than optimizing the flow,
the study in [
3
] adopts the traffic steering method to improve the reliability of the data
transmission utilizing heterogeneous access networks. This area is further investigated
by [
4
,
5
] to improve the overall resource using distinctive machine learning methods to
predict traffic demands and avoid congestion.
6G aspires to have new use cases, more devices, and ubiquitous connectivity. The
two essential elements qualifying the next generation of networks into 6G are the diverse
use cases and the heterogeneous transport networks. According to the study in [
6
], four
pillars support the 6G aspirations: (1) Enhanced Human Communication, (2) Enhanced
Machine Communication, (3) Enabling Services, and (4) Network Evolution. In partic-
ular, this paper focuses on the network evolution of 6G, which aims to enable artificial
intelligence (AI) and expand service ubiquity. AI and service ubiquity emerge as new
opportunities to provide additional options for transport network resources to the overall
service deliverables. The research expands to understand the approach to maximizing
the enormous volume of data generated in the infrastructure, potentially giving valuable
insight for subsequent actions to fulfil the service ubiquity aspiration by enabling the
integration of a non-terrestrial network (NTN) into the terrestrial network (TN). The innate
nature of satellite technology from NTN integration gives inevitable service coverage to
an area. Hence, extending the TN to NTN creates new resource provisioning benefits.
Nonetheless, incorporating the NTN into 5G and 6G is the main challenge while upholding
the existing critical requirements in 5G and future 6G.
The applications of machine learning in telecommunication processes are vast and
diverse. However, machine learning essentially works with large volumes of data, and
handling data for machine learning processes requires massive processing computations
and more time. Incorporating a new process into 5G and 6G infrastructure using machine
learning shall consider the time-critical applications because 6G demands a lower latency
value [
7
]. Based on the study in [
4
,
5
], an Open Radio Access Network (O-RAN) relies on
non-real-time and near-real-time nodes to perform offline learning of the data and push
the programmed inference to the RAN component. Although machine learning gives
new propositions to the decision-making process, a comprehensive study is required to
ensure the non-impairment of the latency value. In recent years, several studies have been
conducted to improve the latency value, such as the placement of edge computing near
the access node and the application of federated learning. Both non-exhaustive methods
share the common objective of reducing the processing time. This study focuses on the
manipulation of a different technique of machine learning to expedite and uphold the opti-
mum decision from the learning process. In addition, the emergence of the non-terrestrial
network co-existing with the terrestrial transport network in the end-to-end infrastructure
necessitates automating the path selection process for optimum resource assignment.
This study focuses on the usability of offline data generated from a series of machine
learning and linear programming works in [
8
], where the clustering outcome implies the
mixtures of traffic classifications that share similar attributes, and the path assignment is
the outcome from the optimization solver for the resource assignment. The results for each
packet are stored as an extensive list of training data for a classification task that labels the
selected transport network for the assigned traffic. Based on the findings from the study
in [
8
], the overall clustering and optimization activity consumes a significant amount of
time and computational resources. It is inapplicable for real-time operations, insinuating
an investigation to adopt supervised learning to manipulate the labeled data from the
clustering and optimization solver processes for classification processes.
This paper classifies path selection based on traffic attributes and assigns it to appropri-
ate transport networks for traffic steering functions. The aim is to perform a classification
that indicates the selection of transport network for each traffic. Respectively, the contribu-
tions to this study are stipulated as follows:
i.
Introduced a traffic steering model that learns from operational data handled by the
5G nodes in a scenario of heterogeneous transport network types.
Big Data Cogn. Comput. 2024,8, 172 3 of 13
ii.
Produced a novel traffic–transport network assignment scheme based on data gener-
ated in an area to achieve optimum resource management.
iii.
Conducted an analysis of the timeslot for optimum classification model performance.
The appropriate volume size of timeslot constituting sufficient diversity of UE traffic
types helped create a good performance model.
The remainder of this study is organized as follows: Section 2elaborates on the
related work on traffic steering approaches using various machine learning applications
for classifications and predictive tasks. Section 3discusses the framework for the traffic-
to-transport network assignment proposal. Section 4highlights the findings and analysis
of the observations from the simulation scenario incorporated in the framework. Finally,
Section 5describes the conclusion and future works.
2. Related Works
The utilization of machine learning in 5G for various objectives in 5G and 6G is
substantial. Machine learning improvises the past communication system that relies on
mathematical models [
9
]. In [
9
], the role of machine learning in 6G networks is classified
from the physical, medium access control (MAC), network, transport, and application
layer. It then elaborates on the challenges in respective domains and how various ma-
chine learning algorithms could resolve the limitations. Similarly, this study is driven
to resolve the resource management issue by introducing a classification method from
supervised learning.
The requirements in 6G are more complex and require highly efficient resource man-
agement. Ref. [
10
] addressed the need to improve resource management from the Vir-
tual Network Function (VNF) management in the Service Function Chain (SFC). The
unpredictable value of traffic and the static resource allocation configuration are the main
contributors to the inefficient resource management faced by the telecommunication ser-
vice provider. In particular, the differences in the service demand create huge variations
in resource allocation. Thus, the study in [
10
] adopts machine learning techniques to
close the gap, enabling the dynamic resource allocation for chaining the VNFs needed
by the predictive method on the resource requirement. The outcome from end-to-end
VNF instances instantiated in an SFC improved VNF resource allocation compared to the
conventional method. A similar mission is shared from the research works in [
5
], where the
machine learning technique is embedded to aid the traffic steering decision by predicting
the congestion on the VNFs serving URLLC traffic.
Studies in [
4
] elaborated on the research from similar abrupt traffic demand problems
by utilizing machine learning techniques to predict the traffic demand and enable dynamic
resource management. Specifically, the study in [
4
] aims to provide a guaranteed latency
requirement and maximize throughput for URLLC and eMBB, respectively. It leverages the
Open Radio Access Network (O-RAN) alliance platform to enable two short-term and long-
term optimization solutions strategies. The short-term strategy is to resolve congestion and
optimize RAN resources using inferences from historical data collected from RAN, which
are learned and modeled by the machine learning process offline. Then, the long-term
strategy objective is to resolve the traffic steering process, which comprises the prediction
of traffic demand, bandwidth-split distribution, and flow-split variables. The findings from
the study in [
4
] indicate the workability of using two separate time-scale learning processes
adopted in this study. The data from this study undergo a series of machine learning and
optimization solver processes in offline mode, potentially inferring the learning outcome
into the programmed RAN/node for execution.
A comparative study was done by Kim et al. to demonstrate the improvement of traffic
steering performance using machine learning techniques [
2
] against traditional methods.
Ref. [
2
] emphasizes the advantages of Mobile Edge Cloud (MEC) in 5G architecture be-
cause MEC handles essential resource management functions like computing, storage, and
networking for last-mile connected nodes. Thus, the MEC node hosts and caches enormous
transactional data that are highly useful for the machine learning process. In particular,
Big Data Cogn. Comput. 2024,8, 172 4 of 13
ref. [
2
] focused the research on manipulating machine learning techniques for traffic steer-
ing decision-making in radio access technology (RAT), specifically for scenarios with the
connectivity of third-generation partnership program (3GPPP) radio access and non-3GPP
radio access to MEC, using deep learning networks. The learning started by executing the
traffic steering algorithms to recognize the network conditions. The process of learning the
network conditions in [
2
] is used in this study, and the mission is to capture the conditions
of each transport network before the subsequent classification process. In addition, this
study shifts the focus from RAT to the heterogeneity of the transport networks, composed
of the terrestrial network (TN) and the non-terrestrial network (NTN); satellite and DOCSIS
cable are the transport technology candidates envisioned for the 6G infrastructure.
The study into NTN cooperation with the TN network progresses towards the defini-
tion of workable architecture between both. The prominent candidate of NTN technology
is satellite technology, and multiple integration types are discussed in [
11
]. The 3GPP in
Release 17 (R17) specified the enhanced functions for the foundational technologies that in-
clude coverage and capacity. Specifically, the integrated satellite–terrestrial network (ISTN)
proposed in [
11
] provides a novel ISTN architecture for different scenarios. Ref. [
11
] also
highlights the challenge of coordinating unified technical standards to enable ISTN because
of the uncertain time of 6G commercialization and the reconciliation efforts within the
satellite industry. Ref. [
12
] performs a survey and highlights the challenges involved in the
ISTN categorically from the network architecture, technical performance, and optimization
together with the findings of key technology enablers to successfully have ISTN for 6G.
3. Network Architecture, Learning Framework, and Methodology
The evolution of network generation from 4G to 5G and beyond enables new technolo-
gies to address the requirements of multiple domains from the industry. The aspiration for
6G continued to expand and provide extensive service coverage via feasible technologies
with the adoption of non-terrestrial networks like satellite technology. Moreover, based
on the niche characteristics of a specific locality, the footprint of such technologies is vast
and reliable. Ref. [
13
] mentioned that the service coverage of Data Over Cable Service
Interface Specification (DOCSIS) Cable TV services comprises 67% of total fixed broadband
subscriptions. On the other hand, the satellite is the optimal candidate for “Ubiquitous
Services” because of its innate capability to reach a place where standard terrestrial network
technologies could not reach. Thus, the study opted to analyze a scenario of the co-existence
of three different types of transport networks in overall network architecture, as shown
in Figure 1. Then, the mechanism to enable the traffic steering function is formulated by
adopting a sequential process of collecting raw data and performing data cleaning and
transformation. Followed by the clustering of packets flowing to the access node, each
computed cluster is assigned to the appropriate transport network that shares similar
characteristics with packets in the respective clusters.
The role of the AI plane in Figure 1is to store the collected data from access and
edge nodes, execute a series of machine learning algorithms, and create and store ML
models. Based on the research work in [
8
], the parameters such as download (DL), upload
(UL), delay, and error rate are collected from nodes. The aggregated packets then undergo
unsupervised learning to extract shared attributes and clusters into three transport network
selection options for traffic steering functions. The subsequent process is optimization,
which uses the mixed integer linear programming (MILP) method to identify the optimum
traffic–transport assignment for each cluster. The processes classify each packet using
DL/UL, delay, and error rate parameters. Then, the processes label the traffic into the cluster
and the assigned transport network type. The overall process is implemented offline, and
each output is stored for classification learning to produce a model that represents the niche
characteristics of traffic generated in a specific area. Ultimately, the classification model is
envisioned to be used in a node where the traffic is labeled and steered to the appropriate
transport network. The proposed concept adopts a similar approach of assigning a bearer
using the Quality of Service Flow Index (QFI). The end-to-end algorithm infers and is
Big Data Cogn. Comput. 2024,8, 172 5 of 13
deliberate from the study in [
4
]. The overall process is illustrated in Figure 2with high-level
information on input and output from such activities.
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 5 of 14
Figure 1. Heterogeneous transport network in 5G network architecture.
The role of the AI plane in Figure 1 is to store the collected data from access and edge
nodes, execute a series of machine learning algorithms, and create and store ML models.
Based on the research work in [8], the parameters such as download (DL), upload (UL),
delay, and error rate are collected from nodes. The aggregated packets then undergo un-
supervised learning to extract shared aributes and clusters into three transport network
selection options for traffic steering functions. The subsequent process is optimization,
which uses the mixed integer linear programming (MILP) method to identify the opti-
mum traffic–transport assignment for each cluster. The processes classify each packet us-
ing DL/UL, delay, and error rate parameters. Then, the processes label the traffic into the
cluster and the assigned transport network type. The overall process is implemented of-
fline, and each output is stored for classification learning to produce a model that repre-
sents the niche characteristics of traffic generated in a specific area. Ultimately, the classi-
fication model is envisioned to be used in a node where the traffic is labeled and steered
to the appropriate transport network. The proposed concept adopts a similar approach of
assigning a bearer using the Quality of Service Flow Index (QFI). The end-to-end algo-
rithm infers and is deliberate from the study in [4]. The overall process is illustrated in
Figure 2 with high-level information on input and output from such activities.
The framework of this study is to explore the classification works using the data from
the earlier works done in [8], represented by Steps 1 to 5, as elaborated below. Thus, the
scope of supervised learning begins from Step 6 onwards.
1. Data collection: raw data generated from UE’s traffic and transport network.
2. The data transformation process involves cleaning and transforming from multivar-
iate to two-dimensional data using unsupervised learning, known as dimensional
reduction.
3. Extraction of information:
a. UE’s generated traffic undergoes a clustering process using an unsupervised
learning technique to form a defined number of clusters with aribute infor-
mation for each cluster.
b. The extraction of aribute information for every type of transport network.
Figure 1. Heterogeneous transport network in 5G network architecture.
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 6 of 14
4. The preparation of clusters onto a transport matrix format for the subsequent match-
ing process. The matching process runs matching algorithms to capture every pair of
matching values based on clusters and transport aributes.
5. The execution of the optimization solver where the objective function is to find the
maximum matching values between clusters and transport aributes. The outcome
of the process yields a decision for the best traffic–transport assignment.
6. The storing process of pertinent data is based on the outcome of the prior activities.
7. The execution of a supervised learning process on the historical data by train and test
process to create an extensive classification model. The vision of the machine learning
classification model is to provide an inference to be programmed in the nodes for
traffic steering decisions.
8. A final output comprises hyperparameters and labels results from clustering and op-
timization solver activities.
Figure 2. The high-level organization flow.
The machine learning algorithms rely heavily on the volume of data. 5G and beyond
are data-driven architectures that utilize multiple data sources from different network
functions and domains for automation, optimization, and improvement to support critical
requirements, specifically in 6G [9]. From the storage activity (Step 6) in the workflow, the
UE’s generated data are used, and aributes like downlink, uplink, delay, and error rate
1.
Data
Collection
4.
Matching
Process
2.
Data
Transformation
Raw data (UE)
Data with
T-SNE
Format 3a.
UE Traffic
Clustering
Cluster #1: N
C#1 packets
Cluster #2: N
C#2 packets
Cluster #3: N
C#3 packets
Cosine Similarity
Raw data (Transport)
5. Optimization
3b.
Tra nsp ort
Attributes
Traffic attributes:
Transport #1
Transport #2
Transport #3
Attributes:
i. Total throughput,
ii. Maximum delay,
iii. Minimum error rate
i. Throughput
ii. Delay
iii. Error rate
Cluster–Transport
Assignment
6. Storage
Database
7.
Classification
Model
Cluster
Cluster–Trans port
Assignment
8. Output
Matrix Model
Creation
Train &
Tes t d ata Inference
Results
Figure 2. The high-level organization flow.
The framework of this study is to explore the classification works using the data from
the earlier works done in [
8
], represented by Steps 1 to 5, as elaborated below. Thus, the
scope of supervised learning begins from Step 6 onwards.
1. Data collection: raw data generated from UE’s traffic and transport network.
2.
The data transformation process involves cleaning and transforming from multivari-
ate to two-dimensional data using unsupervised learning, known as dimensional
reduction.
3. Extraction of information:
a.
UE’s generated traffic undergoes a clustering process using an unsupervised
learning technique to form a defined number of clusters with attribute informa-
tion for each cluster.
Big Data Cogn. Comput. 2024,8, 172 6 of 13
b. The extraction of attribute information for every type of transport network.
4.
The preparation of clusters onto a transport matrix format for the subsequent matching
process. The matching process runs matching algorithms to capture every pair of
matching values based on clusters and transport attributes.
5.
The execution of the optimization solver where the objective function is to find the
maximum matching values between clusters and transport attributes. The outcome of
the process yields a decision for the best traffic–transport assignment.
6. The storing process of pertinent data is based on the outcome of the prior activities.
7.
The execution of a supervised learning process on the historical data by train and test
process to create an extensive classification model. The vision of the machine learning
classification model is to provide an inference to be programmed in the nodes for
traffic steering decisions.
8.
A final output comprises hyperparameters and labels results from clustering and
optimization solver activities.
The machine learning algorithms rely heavily on the volume of data. 5G and beyond
are data-driven architectures that utilize multiple data sources from different network
functions and domains for automation, optimization, and improvement to support critical
requirements, specifically in 6G [
9
]. From the storage activity (Step 6) in the workflow, the
UE’s generated data are used, and attributes like downlink, uplink, delay, and error rate
values are captured and stored in every instance. Table 1indicates the parameters involved
in the study. Three different datasets are captured for every defined duration in three
different timeslots to demonstrate the independent relativity of the classification model.
Table 1. The parameters and attributes for traffic and transport attributes.
No
Parameters Remarks
1 Total No. of UEs
Total UEs: average UEs (5379)
1st Instance = 5443
60th Instance = 5250
2 Traffic Classification
Based on the randomly generated percentage applied to the total number of UEs.
Sampled from 1st instance.
1. Normal UEs: 2352
2. UE classified in Specific 5G Use Cases: 3091
i.
Total eMBB UEs: 1241
ii.
Total URLLC UEs: 250
iii.
Total mMTC UEs: 1600
3 Types of Transport Network
1. Optical Fiber Network
2. Satellite Network
3. DOCSIS Coaxial Cable TV
4Size of Datasets
(based on duration) Three different datasets for 60, 300, and 600 instances, respectively.
5 Machine Learning
Unsupervised learning: multi-output classification using Random Forests algorithm.
i. Dataset, I, with the sample size of duration, t, and users, u.IϵRtxu.
ii. User, u, with features, fm, where m = {DL/UL,pdb,per}.
iii.
Input: uϵI.
iv.
Output: Multiple target variables, v, denoted by cluster number, cn, and
transport networks, tn.
DL: Downlink
UL: Uplink
pdb: Packer Delay Budget
per: Packet Error Rate
Big Data Cogn. Comput. 2024,8, 172 7 of 13
Table 1. Cont.
No
Parameters Remarks
6Machine Learning Algorithm
1. Create and validate machine learning models from various sample sizes.
Step 1: Retrieve data from storage.
Step 2: Create the model, m
ij,
based on sample sizes, I (1%, 10%, 50% and 100%).
Step 3: Obtain predictive value, v, using model, mij.
Step 4: Perform a comparison of false prediction results.
Step 5: Compile results.
2. Validate machine learning models from across various datasets.
Step 1: Perform multi-output classification model, mi4, to all datasets, I.
Step 2: Capture error results.
Step 3: Perform a comparison of false prediction results with the actual
cluster-assignment process (Step 4, 5, 6) from Figure 2.
Step 4: Compile results.
Two new columns are added based on the output from Step 3 and Step 5 from Figure 2,
representing the cluster group of UEs and the types of transport assigned. Step 3 uses
raw data as an input, transforms the data into the two-dimensional form and executes a
clustering technique to identify the hidden pattern based on the density of every point
tabulated on the graph. It is clustered into three defined clusters to be mapped using the
total number of transport types. Subsequently, each cluster is mapped to every possible
pair between cluster, cn, and the transport network, tn, types. The highest matching
score for traffic–transport represents high similarity, and it will be selected for the traffic–
transport assignment process using an optimization solver to obtain the best matching
cluster–transport pair. Thus, variables, v
ϵ
{cn,tn} and the Vis the predictive value of v
for each input of a user, u, I; therefore, u
ϵ
I. Table 2enlists the activities in prior processes
(from Step 1 to Step 5), and this study simplified the process to a minimum number of
processes, reducing the handling time.
Table 2. The detailed activities in prior works.
No Activity Output Source/Process Remarks
1Data Simulation,
[14,15]
List of Parameters:
i. Unique ID
ii. Downlink, DL
iii.
Uplink, UL
iv.
Delay, pdb
v. Error Rate, per
Simulation
Python-based simulator generating users
Input: generate UE and traffic dataset, I.
Output: Parameter values for clustering.
Step 1:
Define UE types.
- User, up,p, traffic classification types.
Step 2:
Define UE mobility pattern.
- Dynamic and static types of users.
Step 3:
Generate UE and traffic in UE.
Step 4:
Store the UE information.
2Data Collection and
Transformation
TSNE two-dimensional
data format
Unsupervised
learning:
Dimensionality
Reduction
User, u with features, fm, where m = {DL/UL, pdb, per}
transformed to ut-sne = [xi].
3 Clustering Three clusters consist of
UE packets
Unsupervised
learning:
HDBSCAN clustering
Input: UE and traffic dataset, I.
Output: UEs in three clusters, Cn.
Step 1:
Measure the distances between points, ddist = |xcore
−xi|.
Step 2:
Define HDBSCAN core and minimum samples and
cluster size.
Step 3:
Visualize the clusters.
Step 4:
Compute the attributes for each cluster.
Big Data Cogn. Comput. 2024,8, 172 8 of 13
Table 2. Cont.
No Activity Output Source/Process Remarks
4 Matching Process
Finding the matching
score for every
possible pair of
clusters and transport
networks.
Input: UEs in clustering format and transport network
attributes.
Output: The matching values between cluster and transport
network.
Step 1:
Compute the attributes of the transport network,
Tn, throughput capacity, VT-dl/ul, the round-trip
time, βT,and packet error rate, εT.
Step 2:
Perform cosine similarity, cos(θ).
cos(θ) = Cn·Tn
|Cn| |Tn|
5 Resource Assignment Assigned transport
network.
Compute the MILP
process to obtain the
optimum assignment
of cluster transport.
Input: Cosine similarity values between clusters against
transport network.
Output: Assignment of transport network.
Step 1:
Define the objective function.
Maximize the matching score between cluster and
transport network attributes.
Step 2:
Define the constraints.
Single assignment of a cluster to a transport
network.
A Python-based program simulated the UEs and the traffic generated by the UEs.
It generated various types of traffic/users scattered and served in cells converging to an
access node, as in Figure 1. In an area, the simulator emulated the generations of (1) static
and mobile users and (2) the classifications of traffic produced by each user. The traffic
generated by UEs followed the attributes of eMBB, URLLC, and mMTC categorized by the
size of DL and UL data, packet error rate, and packet delay budget.
Supervised learning works with models were created from the training and test
process of data. However, this study explored creating a model based on splitting the data
according to the ratio in Table 3. This step explored the granularity of the data size, where
the models were developed by splitting the data based on percentages in Table 3. The
volume of each data set was determined by the duration of the recorded data and the total
number of UEs in each instance. Referring to Figure 3, the volume of each instance is huge
because of the high number of UEs (more than 5000 UEs) with diverse attribute values. The
multiplication of the total number of UEs with total instances in each duration tabulated in
Table 3possesses a weightage to be measured from this perspective. Models denoted by
M
ij
represent the volume of data used to build the model. Subsequently, a series of M
i4
models was used to train different data sets.
Next, the selected model M
i4
from Table 4was used to classify the cluster number
and the assigned transport network. Each M
i4
represents the maximum volume of data
generated in a set used to train and build a model to be used as a classifier for different
datasets. The performance of the classification works will be validated by cross-checking
the value from the actual clustering process (Step 3) and traffic–transport assignment
(Step 5) against the multi-output classification model results.
Table 3. The creation of models based on the splitting ratio.
No Dataset, iDataset Splitting Percentages, j
(DL & UL) Duration Average * #UEs 1% 10% 50% 100%
1 Set 1.1 60 instances 5379 M11 M12 M13 M14
2 Set 5.1 300 instances 4183 M21 M22 M23 M24
3 Set 10.1 600 instances 2339 M31 M32 M33 M34
* Average number of UEs per dataset.
Big Data Cogn. Comput. 2024,8, 172 9 of 13
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 9 of 14
Step 2: Define the constraints.
Single assignment of a cluster to a transport
network.
A Python-based program simulated the UEs and the traffic generated by the UEs. It
generated various types of traffic/users scaered and served in cells converging to an ac-
cess node, as in Figure 1. In an area, the simulator emulated the generations of (1) static
and mobile users and (2) the classifications of traffic produced by each user. The traffic
generated by UEs followed the aributes of eMBB, URLLC, and mMTC categorized by
the size of DL and UL data, packet error rate, and packet delay budget.
Supervised learning works with models were created from the training and test pro-
cess of data. However, this study explored creating a model based on spliing the data
according to the ratio in Table 3. This step explored the granularity of the data size, where
the models were developed by spliing the data based on percentages in Table 3. The
volume of each data set was determined by the duration of the recorded data and the total
number of UEs in each instance. Referring to Figure 3, the volume of each instance is huge
because of the high number of UEs (more than 5000 UEs) with diverse aribute values.
The multiplication of the total number of UEs with total instances in each duration tabu-
lated in Table 3 possesses a weightage to be measured from this perspective. Models de-
noted by Mij represent the volume of data used to build the model. Subsequently, a series
of Mi4 models was used to train different data sets.
Figure 3. The summary of the total number of UEs generated from the simulation.
Next, the selected model Mi4 from Table 4 was used to classify the cluster number and
the assigned transport network. Each Mi4 represents the maximum volume of data gener-
ated in a set used to train and build a model to be used as a classifier for different datasets.
The performance of the classification works will be validated by cross-checking the value
from the actual clustering process (Step 3) and traffic–transport assignment (Step 5)
against the multi-output classification model results.
Figure 3. The summary of the total number of UEs generated from the simulation.
Table 4. The application of ML models on every dataset.
No Dataset, iML Model, M Actual ML Classifier
1 Set 1.1
M14 M24 M34 Cluster
Number, cn
Assigned
Transport,
tn
Cluster
Number, cn
Assigned
Transport,
tn
2 Set 1.2
3 Set 1.3
4 Set 5.1
5 Set 5.2
6 Set 5.3
7 Set 10.1
8 Set 10.2
9 Set 10.3
4. Findings and Discussions
The first output (please refer to Table 5) from the supervised learning process focused
on the feasibility of model creation based on volume and the diverse values of the UE’s
attributes. Partial instances (based on splitting percentage) from the entire dataset were
used to train and create the classification model. The finding from this stage demonstrated
that a high volume of datasets composed of durations and number of UEs produced better
model accuracy. The accuracy of the multi-output classification model was determined by
the percentage of the total number of wrong classification outputs against actual values from
offline clustering and resource assignment processes using the MILP method. The table
below stipulates the pattern showing the result of the supervised classification learning
model. The model’s performance improved by producing a minimum error percentage
according to the trained dataset’s expansion.
In both downlink and uplink streams, the model performs better in the increment
volume of data. Subsequently, the study assesses the practicality of each created model
to classify the targeted variables on the other dataset. Referring to Table 6, the model’s
performance on each dataset varies. The classification model developed from Set 1.1 con-
sists of only 60 instances that produce a significant number of errors (unmatched output of
Big Data Cogn. Comput. 2024,8, 172 10 of 13
classification model against actual), and the second classification target value, the assigned
transport network, shows more than 70% unmatched output. Likewise, the classifica-
tion model developed based on the massive volume of data in Set 10.1, consisting of
600 instances, produces close to 60% unmatched output.
Table 5. The performance of the model created based on the volume of UEs in the dataset.
Dataset, iSet 1.1 Set 5.1 Set 10.1
No Split Ratio, jModel Cluster Transport Cluster Transport Cluster Transport
Downlink
1 1% Mi10.00% 33.56% 8.19% 35.45% 0.00% 25.48%
2 10% Mi20.00% 22.12% 0.21% 1.72% 0.00% 3.33%
3 50% Mi30.00% 9.18% 0.08% 0.17% 0.00% 0.02%
4 100% Mi40.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Uplink
5 1% Mi130.58% 50.53% 1.39% 73.62% 44.90% 73.44%
6 10% Mi245.94% 41.78% 0.00% 2.13% 0.00% 0.50%
7 50% Mi30.00% 4.63% 0.00% 0.00% 0.00% 0.21%
8 100% Mi40.00% 0.00% 0.00% 0.00% 0.00% 0.00%
% value: represents the number of errors (unmatched classification output against the actual values).
The second model, developed with 300 instances, sits in the middle range between
model Set 1.1 and Set 10.1 and shows good classification output. However, the percentage
errors for the second classification target (assigned transport network) are higher than
the first target output (cluster number). Overall performance of the model developed
from Set 5.1 shows full accuracy when classifying the “cluster number” for downlink and
uplink streams, whilst the classification for “assigned transport network” gives the average
of 7.37% and 6.55%, respectively. The value translates to a scenario where in 100 traffic–
transport assignment decisions, eight traffic from the downlink stream and seven traffic
from the uplink stream will wrongly be assigned to the non-optimum transport network.
The output from upstream links shows low variations of errors across all models developed
and tested to respective datasets. The average error percentage for all datasets shows that
the model developed using set 10.1 gives more than 95% classifying accuracy. Referring to
the table, the highlighted red cells indicate the model’s performance against its training
dataset; hence, it produces no error.
The overall process from Figure 2demonstrates a small task but plays a significant
role in producing a feasible classification model. The data collection process begins with
extensive cleansing and the transformation of raw data for subsequent machine learning
use. The UE’s data produce multiple attributes that transform high-dimensional data, and
hence, a dimensional reduction algorithm from unsupervised learning is required before
undergoing the clustering process. A series of algorithms to process a massive volume
of data, then utilizing MILP processes, requires significant handling time. Therefore, the
prior processes are set to be employed offline and stored as an individual profile of a
node in the area. The classification technique from supervised learning is used to learn
offline data, reducing the handling time by providing instantaneous traffic labelling from
steering functions.
Compared with the previous study done in [
5
], the processes defined in Figure 2
produce a base value set to represent the time taken to collect data, transform, cluster and
perform the resource assignment (Steps 2 to 5), as shown in Table 2, which is the most time
needed to execute the overall processes contributed by the clustering activities. Figure 4
illustrates the share of time required to cluster data with parameters specified in Table 1.
Big Data Cogn. Comput. 2024,8, 172 11 of 13
Table 6. The performance of the classification model on the datasets, i.
Set 1.1: Model, M14 Set 5.1: Model, M24 Set 10.1: Model, M34
No Dataset, iCluster Transport Cluster Transport Cluster Transport
Downlink
1 Set 1.1 0.00% 0.00% 0.00% 26.07% 0.00% 0.00%
2 Set 1.2 17.86% 74.06% 0.00% 0.00% 13.84% 59.93%
3 Set 1.3 20.59% 60.32% 0.00% 0.68% 0.00% 30.61%
4 Set 5.1 25.73% 50.92% 0.00% 0.00% 0.00% 18.04%
5 Set 5.2 1.60% 21.61% 0.00% 0.24% 0.14% 20.09%
6 Set 5.3 21.58% 32.74% 0.00% 0.15% 0.10% 21.02%
7 Set 10.1 0.72% 4.07% 0.00% 31.38% 0.00% 0.00%
8 Set 10.2 4.72% 37.43% 0.00% 0.30% 0.29% 13.71%
9 Set 10.3 4.75% 32.95% 0.00% 0.14% 0.26% 12.48%
Uplink
1 Set 1.1 0.00% 0.00% 0.00% 0.00% 0.00% 19.37%
2 Set 1.2 1.65% 1.68% 0.00% 18.63% 0.00% 18.63%
3 Set 1.3 0.00% 0.00% 0.00% 17.32% 0.00% 0.21%
4 Set 5.1 1.72% 2.00% 0.00% 0.00% 0.00% 0.00%
5 Set 5.2 0.72% 1.02% 0.00% 0.00% 0.00% 0.00%
6 Set 5.3 0.79% 1.37% 0.00% 0.34% 0.00% 0.14%
7 Set 10.1 2.40% 8.67% 0.00% 6.24% 0.00% 0.00%
8 Set 10.2 2.94% 12.69% 0.00% 9.74% 0.00% 0.05%
9 Set 10.3 2.99% 12.05% 0.00% 0.12% 0.00% 0.11%
% value: represents the number of errors (unmatched classification output against the actual values).
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 12 of 14
use. The UE’s data produce multiple aributes that transform high-dimensional data, and
hence, a dimensional reduction algorithm from unsupervised learning is required before
undergoing the clustering process. A series of algorithms to process a massive volume of
data, then utilizing MILP processes, requires significant handling time. Therefore, the
prior processes are set to be employed offline and stored as an individual profile of a node
in the area. The classification technique from supervised learning is used to learn offline
data, reducing the handling time by providing instantaneous traffic labelling from steer-
ing functions.
Compared with the previous study done in [5], the processes defined in Figure 2 pro-
duce a base value set to represent the time taken to collect data, transform, cluster and
perform the resource assignment (Steps 2 to 5), as shown in Table 2, which is the most
time needed to execute the overall processes contributed by the clustering activities. Fig-
ure 4 illustrates the share of time required to cluster data with parameters specified in
Tabl e 1 .
Figure 4. The distribution of time taken to execute both methods.
5. Conclusions and Future Works
Using user traffic data for unsupervised learning and supervised learning techniques
facilitates the development of feasible classification functions to achieve the optimum traf-
fic–transport assignment. A programmed network node with the classification model
shall label the traffic based on the classification output that determines the selection of
transport types for the traffic. The study eventually helps to automate the selection of a
path for pertinent traffic, envisioning the 6G extensive TN and NTN aspirations for opti-
mal “service ubiquity”. Thus, the goal to enable highly efficient resource management
could be realized to achieve higher performance requirements in 6G with low errors in
the classification results that give more than a 90% classification accuracy score.
The study also opens an avenue to impose a classification model based on an appro-
priate timeslot that could determine the frequency of the provisioning of inference models
to be programmed to the pertinent nodes in an area. In addition, the entire study could
lead to a new opportunity where each traffic requirement in an area could be handled by
specific models that are dynamically created based on local demand (traffic diversity) and
Figure 4. The distribution of time taken to execute both methods.
Big Data Cogn. Comput. 2024,8, 172 12 of 13
5. Conclusions and Future Works
Using user traffic data for unsupervised learning and supervised learning techniques
facilitates the development of feasible classification functions to achieve the optimum
traffic–transport assignment. A programmed network node with the classification model
shall label the traffic based on the classification output that determines the selection of
transport types for the traffic. The study eventually helps to automate the selection of
a path for pertinent traffic, envisioning the 6G extensive TN and NTN aspirations for
optimal “service ubiquity”. Thus, the goal to enable highly efficient resource management
could be realized to achieve higher performance requirements in 6G with low errors in the
classification results that give more than a 90% classification accuracy score.
The study also opens an avenue to impose a classification model based on an appro-
priate timeslot that could determine the frequency of the provisioning of inference models
to be programmed to the pertinent nodes in an area. In addition, the entire study could
lead to a new opportunity where each traffic requirement in an area could be handled by
specific models that are dynamically created based on local demand (traffic diversity) and
capacity (bandwidth resources and the heterogeneity of transport network provision in
the area).
Future studies should extend the variable from this study to determine the end-to-end
flow of imposing models based on the timeslot into the programmed nodes using the
O-RAN platform. The scope of this study requires the regressing steps on the models for
respective timeslots to assess the performance of the classification model in a time-series
format to establish the overall models’ organization. The other variable is to learn the
model’s compatibility to produce and assign a highly efficient model for an area and other
possible uses of the models in other areas. Lastly, there is potential for the incorporation
of the organizational flow of this study in a federated learning framework. Finally, re-
search on the workability and improvement could be harvested in the telecommunication
infrastructure based on a federated learning approach.
Author Contributions: Conceptualization, H.A.H.N. and T.M.; methodology, H.A.H.N.; validation,
H.A.H.N. and T.M.; formal analysis, H.A.H.N.; investigation, H.A.H.N.; resources, H.A.H.N.; data
curation, H.A.H.N.; writing—original draft preparation, H.A.H.N.; writing—review and editing,
H.A.H.N.; visualization, T.M.; supervision, T.M. All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data of this research will be provided to interested individuals
upon request to the corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest, and the funders had no role in
the design of the study; in the collection, analyses, or interpretation of data; in the writing of the
manuscript; or in the decision to publish the results.
References
1.
Patil, A.; Iyer, S.; Pandya, R.J. A Survey of Machine Learning Algorithms for 6G Wireless Networks. arXiv 2022, arXiv:2203.08429.
2.
Kim, D.-Y.; Kim, S. Network-Aided Intelligent Traffic Steering in 5G Mobile Networks. Comput. Mater. Contin. 2020,65, 243–261.
[CrossRef]
3.
Choi, Y.; Kim, J.H. Reliable data transmission in 5G Network using Access Traffic Steering method. In Proceedings of the 2020
IEEE International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea,
21–23 October 2020; pp. 1034–1038. [CrossRef]
4.
Kavehmadavani, F.; Nguyen, V.-D.; Vu, T.X.; Chatzinotas, S. Intelligent Traffic Steering in Beyond 5G Open RAN Based on LSTM
Traffic Prediction. IEEE Trans. Wirel. Commun. 2023,22, 7727–7742. [CrossRef]
5.
Tamim, I.; Aleyadeh, S.; Shami, A. Intelligent O-RAN Traffic Steering for URLLC Through Deep Reinforcement Learning.
arXiv.2023. Available online: http://arxiv.org/abs/2303.01960 (accessed on 8 April 2024).
6.
Erfanian, J.; Lister, D.; Zhao, Q.; Wikström, G.; Chen, Y. 6G Vision & Analysis of Potential Use Cases. IEEE Commun. Mag. 2023,
61, 12–14. [CrossRef]
7.
Salameh, A.I.; El Tarhuni, M. From 5G to 6G—Challenges, Technologies, and Applications. Future Internet 2022,14, 117. [CrossRef]
Big Data Cogn. Comput. 2024,8, 172 13 of 13
8.
Ng, H.A.H.; Mahmoodi, T. Intelligent Traffic Engineering for 6G Heterogeneous Transport Networks. Computers 2024,13, 74.
[CrossRef]
9.
Ali, S.; Saad, W.; Rajatheva, N.; Chang, K.; Steinbach, D.; Sliwa, B.; Wietfeld, C.; Mei, K.; Shiri, H.; Zepernick, H.J.; et al. 6G White
Paper on Machine Learning in Wireless Communication Networks. arXiv.2020. Available online: http://arxiv.org/abs/2004.138
75 (accessed on 8 April 2024).
10.
Basu, D.; Kal, S.; Ghosh, U.; Datta, R. SoftChain: Dynamic Resource Management and SFC Provisioning for 5G using Machine
Learning. In Proceedings of the 2022 IEEE Globecom Workshops (GC Wkshps), Rio de Janeiro, Brazil, 4–8 December 2022;
pp. 280–285. [CrossRef]
11.
Qi, W.; Wang, H.; Xia, X.; Mei, C.; Liu, Y.; Xing, Y. Research on Novel Type of Non Terrestrial Network Architecture for 6G. In
Proceedings of the 2023 IEEE International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco,
19–23 June 2023; pp. 1281–1285. [CrossRef]
12.
Tirmizi, S.B.R.; Chen, Y.; Lakshminarayana, S.; Feng, W.; Khuwaja, A.A. Hybrid Satellite–Terrestrial Networks toward 6G: Key
Technologies and Open Issues. Sensors 2022,22, 8544. [CrossRef] [PubMed]
13.
Schnitzer, J.; Prahladan, P.; Rahimzadeh, P.; Humble, C.; Lee, J.; Lee, J.; Lee, K.; Ha, S. Toward Programmable DOCSIS 4.0
Networks: Adaptive Modulation in OFDM Channels. IEEE Trans. Netw. Serv. Manag. 2021,18, 441–455. [CrossRef]
14.
ETSI. 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects. Service Aspects; Service
Ca-pabilities (Release 16); ETSI TS 122 261 V16.14.0. 2023. Available online: https://www.etsi.org/deliver/etsi_ts/122200_12229
9/122261/16.14.00_60/ts_122261v161400p.pdf (accessed on 12 September 2024).
15. Köksal, B.; Schmidt, R.; Vasilakos, X.; Nikaien, N. CRAWDAD eurecom/elasticmon5G2019. IEEE Dataport. 2022. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.