Content uploaded by Marco Araújo
Author content
All content in this area was uploaded by Marco Araújo on Mar 30, 2023
Content may be subject to copyright.
978-1-6654-9810-4/22/$31.00 ©2022 IEEE
AI-driven Human-centric Control Interfaces for
Industry 4.0 with Role-based Access
Raul Barbosa
Capgemini Engineering
Porto, Portugal
raulfilipe.diasbarbosa@capgemini.com
Marco Araujo
Capgemini Engineering
Porto, Portugal
marco.araujo@capgemini.com
Abstract—Nowadays, we are accustomed to using voice virtual
assistants in Smart Home contexts or to ask for instructions. The
potential that this technology presents has been growing, however,
and even with the transition to Industry 4.0, these capabilities are
significantly unexplored in industrial environments. In these
contexts of high automation, human operators will have more
sophisticated interventions, being supported and collaborating
with intelligent systems to execute operations and receive
information dynamically and proactively. The goal of this work is
to introduce areas such as Machine Learning, in order to recognize
the user by his or her voice, and the Intent Based Networking area,
which allows the orchestration of the network through intents, a
type of policy that expresses objectives without mentioning how
they are implemented. The system will use Machine Learning
models to recognise and provide permissions to users that,
depending on the permissions, will implement policies over the
network. Since specific knowledge about networks is not generally
known, this technology has made it easier for the user to
orchestrate the network using voice commands.
Keywords—Virtual Voice Assistant, SDN, Intents, Speaker
Recognition Systems, Machine Learning;
I. INTRODUCTION
The growth in the number of devices and technological
advances creates a need to explore and improve new ways for
human-machine interaction. With this, the research and
development conducted in Artificial Intelligence (AI) made
possible more natural forms of inputs, such as voice.
Consequently, and with technological advances in speech
processing and Natural Language Processing (NLP), systems
are offered the ability to interact with them through the voice
and understand what is intended to be done. These systems are
called voice assistants and have grown exponentially in their
use. Currently, these systems’ capabilities are significantly
untapped in industrial environments. In these industrial
contexts, and with the advance of Industry 4.0, the need
emerges to raise automation to its maximum potential,
allowing, for example, voice virtual assistants to perform
increasingly complex functions.
In industrial contexts, it is normal for people to perform
different tasks, and as such, they may be associated with a type
of role. Good data management and control must offer the
permissions for accessing and handing the systems to the
correct roles. The overall process is called authorisation,
typically defined by a system that controls the different users'
permissions. To perform the proper authorisation, it is
necessary to identify the user correctly, which is done by
authentication.
Once recognized and authorized via Machine Learning
(ML) algorithms, the user expresses which rules he wants to
implement in the network. These rules are applied through
intentions, a type of policy that expresses objectives without
mentioning how they are implemented. The virtual assistant
will capture the user's intentions and transmit them to the
network management system, where the intent state machine is
started, a cycle that analyses all the processes of an intent, since
its capture until its installation.
II. RELATED WORK
Extensive work has been done on NLP for Access Control
and on intent-based state-machines. This section provides an
overview on these two topics.
A. Speech recognition system
Natural language access control policies can be
unstructured and ambiguous and consequently they cannot be
directly implemented in an access control mechanism. In [1],
the authors create a methodology to tackle this issue, by
applying linguistic analysis to parse natural language
documents and annotate words to identify whether from a given
sentence, semantic arguments can be inferred. Through this
methodology, the authors claimed they have obtained results
which can effectively identify access control policies with a
precision of 79%.
Due to the massive policy scale and number of access
control entities in open distributed information systems
available in Industry 4.0 systems, existing access control
permission decision methods suffer from a performance
bottleneck. To overcome this, in [2] a permission decision
engine scheme based on random forest algorithm to construct a
vector decision classifier is proposed, for which the authors
claim to have achieved a permission decision accuracy of
around 92.6%.
In [3] a method for improving the policy decision
performance by eliminating conflicts was proposed, but with
poor improvement in the performance.
2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) | 978-1-6654-9810-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/INISTA55318.2022.9894247
Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on March 30,2023 at 21:41:27 UTC from IEEE Xplore. Restrictions apply.
[4] proposed a k-means clustering on the access control
policy set, concluding that the order of the policies in the policy
set has a significant impact on the permission decision
efficiency.
[5] proposed a permission decision optimization method
based on two tree structures: match tree and combination tree.
The match tree uses a binary search algorithm to rapidly search
for the policy matching the access request, and the combination
tree evaluates the access request on the basis of the matching
policy.
In [6], to overcome the constrains that are due to the lack of
appropriate data, the authors propose a methodology to
generate sets of realistic synthetic natural language access
control policies.
B. Intent based networks
The following table summarizes the list of relevant related work
regarding intent-based networking:
Functions / Features
References
Service model and orchestration
[7], [8], [9], [10]
Network orchestration
[11], [12]
Monitoring and resource exposure
[13], [14]
Intent deployment and configuration
[15]
Table 1- Description of IBN architecture scope and related works
III. SYSTEM OVERVIEW
The system is divided into two components, the first
responsible for identifying the user and the second for
implementing the user's intention when they have permissions.
A. Speech recognition system
The system can recognise the user by voice, through pre-
trained models, thus enabling a role-based access. The model
for user recognition is divided into two phases, the training
phase and the test phase. Its main difference is in the last step,
where in training, the data is sent to the modelling algorithms,
and the test phase sends it to an already trained model for its
data classification, c.f., Figure 1.
Figure 1-Speaker Recognition System: Training and Testing Phase
As we can see in Figure 1, the training phase ends with a
trained model and the test phase with results relative to a model.
It is also important to mention that the Data Processing phase is
a sub-module of the Build Dataset, that is, Build Dataset also
includes Data Processing. Therefore, to better describe both
phases in detail and show their common parts, in Figure 2 it is
possible to observe the workflow at a much more detailed level.
Figure 2-Speaker Recognition System: Workflow
1. Data collection-This is the initial process where
information from users is collected, namely the number
of speeches they have as well as a list corresponding to
the speeches’ directories.
2. Processing phase- After loading the speech file, four
approaches are applied:
a. Trimming.
b. Optionally, and if necessary, Data Augmentation.
c. Feature Extraction.
d. Depending on the used algorithm, Multiplayer
Perception or Linear Discriminant Analysis,
optionally, we can split the Dataset in two: Data
Training and Data Validation
3. Modelling or testing-After that, we can follow two
"paths", one for the modelling where the model is
trained for future identification of users or for the pre-
trained model where the data will be classified that is,
training and testing phases, respectively.
As far as data construction (processing) is concerned, we
apply Trim, Data Augmentation, Feature Extraction, Data
Scaling, and Split Data. For modelling, we use Neural Network-
Multilayer Perceptron (MLP) and Linear Discriminant
Analysis (LDA).
Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on March 30,2023 at 21:41:27 UTC from IEEE Xplore. Restrictions apply.
Trim is an approach that aims to cut silent signal spaces, i.e.,
without any speech. This will have an impact on the total speech
time. In Figure 3, we can visualise the same speech signal in its
first phase (Normal) and after applying this technique
(Trimmed). To identify the silence zones, a threshold is defined,
in which case defined that below 20 decibels, it is considered a
silent zone (without speech).
Figure 3-Normal Speech vs Trimmed Speech
A common problem is the lack of data for model training.
In other words, when working with small datasets, there are
techniques (they vary according to the type of data) that
artificially, through the original data, can create new data. This
approach is called data augmentation, and hence it is an
optional process that is only used depending on the need for
such data augmentation to existing. Knowing that we are
dealing with voice signals, some more common approaches are
injecting noise, changing the pitch and changing the speed.
However, given the context of the problem, changing the pitch
and velocity would change the characteristics of the voice, and
therefore the most appropriate and used technique is noise
injection. In this way, having the original voice signal, its
speech time is calculated. After that, the noise is created and
integrated into the original signal, thus duplicating the data
Feature Extraction is an important step that allows
extracting the unique characteristics of the voice signal. There
are two types of features: Temporal (time-domain features) are
simple to extract and have easy physical interpretation (energy
of signal, zero-crossing rate, maximum amplitude, minimum
energy) and spectral features (frequency-based features), are
obtained by converting the time-based signal into the frequency
domain using the Fourier Transform, like fundamental
frequency, frequency components, spectral centroid, spectral
flux, spectral density, spectral roll-off. These features can be
used to identify the notes, pitch, rhythm, and melody. We apply
seven methods for extracting features: Chroma features,
MFCC, Roor-Mean-Square, Zero Crossing Rate, Spectral
Centroid, Spectral Contrast and Spectral Rolloff.
Finally, regarding data scaling, uses the Standard Scaling
approach. Transforms the data so that its distribution will have
a mean value of zero and a standard deviation of one. All this
voice processing and model creation is integrated into an API
REST server that stores the machine learning model to identify
users.
So, until now all the techniques for building the model and
working on the voice have been presented. However, it is
necessary to capture the user's voice, and for that the open-
source virtual assistant, Mycroft, is used, which enables a
human-centric approach. See Figure 4.
Figure 4-Integration with Virtual Assistant
The user requests the creation of a service to Mycroft which
then interprets the type of action required and sends it to the
recognition server. The server recognises the user by sending
the signal captured to the machine learning model. After
recognition, a query is made to the database to understand the
user's roles and respective permissions. If the user has
permission, the service is implemented, otherwise the user is
informed that he has no permissions.
B. Intent Based Network
Once the user is identified and allowed to implement the
request, it goes to the intent-based networking system, where a
layer of intelligence is applied to the network, replacing manual
network configuration processes, reducing the complexity of
creating, managing and enforcing network policies. To
understand how this part of the system will work, the
architecture is shown in Figure 5.
Figure 5- Intent Based System Architecture
The system architecture is designed to operate in a business
context, where the user interacts with the assistant using his
equipment (e.g. computer). Therefore, the architecture modules
are explained in the following section.
To capture the user's commands, the virtual assistant is the
component responsible for maintaining communication
between the system and the user. After being activated, the
virtual assistant saves the user's intention and forwards it to the
system to validate and implement it. The assistant is in charge
of associating the policy with the user's intention and
converting the high-level language into a machine language,
sending it to the component where it will be analysed.
The policy repository is the component responsible for
storing all existing policies in the system as a database.
Responsible for communication between the virtual
assistant and the policy repository, the Policy Input Stage is the
component in charge of analysing conflicts so that the user's
policy does not conflict with some rule already implemented in
the network. In case of success, this component sends the rules
to be implemented. On the contrary, if there is a conflict, this
component initiates negotiation with the user. Furthermore, the
thresholds to validate the policies are part of this component,
Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on March 30,2023 at 21:41:27 UTC from IEEE Xplore. Restrictions apply.
sending to policy runtime enforcer for analysing and drawing
conclusions.
The Policy Runtime Enforcer is responsible for constantly
monitoring the network and creating the request for installation.
In this component, information from the network is collected,
analysed, planned, and finally executed. It understands the
problem and how to envisage policy implementation, such as
operations and resources. An important factor is planning,
which plays a critical role in accepting policies and why it will
be the element that will govern a possible negotiation with
users. In addition to planning, data analysis is responsible for
analysing the previous component thresholds to act if a policy
fails.
Testbed is an REST Server that is in charge of the
communication between the system and the controller SDN.
This server receives the installation request from the Policy
Runtime Enforcer and tries to install it in the controller. In this
component, all the installation logic is set, knowing which
requests the controller supports or not. Besides installation
requests, there are data collection requests that will be useful to
apply monitoring algorithms and calculate interesting variables
for the study.
SDN controller is in charge of obtaining both, the data from
the network and implementing any rules. In addition, the
controller communicates with the infrastructure, obtaining the
necessary information to be used for the system to make the
changes it wishes. The controller used was the OpenDayLight
(ODL).
After presenting the architecture, we describe each of the
components by presenting an intents state machine. This state
machine allows us to understand the life cycle of an intent since
it was captured by the virtual assistant until its installation, not
compromising the network operation. As we can see in Figure
6, the state machine is divided into six sections. The validation
phase and conflict analysis are part of the Policy input stage.
The compilation and monitoring phase are the processes present
in the Policy Runtime Enforcer module, and finally, the
installation phase, which is the responsibility of the REST
server Testbed.
After the user requests in the first phase, the validation
phase verifies that the request contains all the necessary
information to implement the desired action. If the user has
forgotten to mention necessary information for the request to be
implemented or if the information, he has provided is wrong,
this phase assigns an invalid status, which will require user
interaction. On the other hand, if the user provides all the
necessary information for the request to proceed, it assigns a
valid status and continues to the next phase.
Figure 6-Intent State Machine
Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on March 30,2023 at 21:41:27 UTC from IEEE Xplore. Restrictions apply.
The conflict phase is the second checkpoint of this state
machine. Here, the pre-validated attempts are subject to a
comparison with the requests implemented and stored in the
database to conclude conflicts. If the request is redundant, or if
the new request consists of information contrary to that already
implemented, the conflict phase assigns a conflict state which
will be resolved with the help of the user. If the information
does not conflict, we move on to the compilation section.
When the request reaches the compilation phase, the system
tries to convert into rules the policies that the user wants to
implement, i.e., the system converts the high-level language
into a language that can be interpreted by the controller so that
it is then possible to install them. Sometimes the desired
information may not be supported by the controller, or the
desired operations may not be operational, and in this case, the
compilation phase gives a compilation error that can be solved
with the help of the user. If we can get the rules to be installed,
we move on to the installation phase.
Once the rules are obtained, the next step is to install them
on the controller. If the objectives are supported by the
controller and installed, we move on to the monitoring phase.
If the goals are not achievable because they may be offline or
non-existent, the user is alerted that the application was not
installed.
In the final phase, monitoring consists of a cycle that
constantly checks if the existing requests in the database are
being fulfilled or if any violation has occurred. If any non-
compliance occurs, the system tries to identify how to solve the
problem without human intervention automatically. If the
system does not identify a solution that corrects the problem,
the user is alerted. After the user is recognised and has
permissions, if the intent passes all these stages, the application
is installed on the network and starts to be monitored.
IV. EXPERIMENTAL RESULTS
As in section III, the results will be presented in two
sections, the first one referring to the speech recognition system
results and the second one to the intents system results.
Regarding the speech recognition system, it was necessary
to obtain datasets with user (i.e., person) speeches.
Dataset
Size(users)
Number of
speech’s
Average time of each
speech(seconds)
Noizeus[16]
6
5
2,6680
TIMIT[17]
462
8
2,6726
LibrisSpeech[18]
156
77
6,8771
Table 2-Dataset properties
As we can see in the previous table, their characteristics
allow us to study the scalability performance of the two chosen
algorithms concerning the growth in the number of users.
Additionally, we can also analyse the performance of the
approaches considering the amount of data per user. The chosen
approaches are known as MLP and LDA.
The primary study measures the performance of the
different algorithms chosen as a function of the growth in the
number of users. Its performance is also analysed with and
without the use of data augmentation. The results are presented
in Table 3, in terms of percentual success rates.
Dataset
Models
with Data
Augment
without
Data
Augment
Noizeus
Multilayer Perceptron
100%
100%
Linear Discriminant Analysis
100%
100%
TIMIT
Multilayer Perceptron
100%
87,88
Linear Discriminant Analysis
92,21%
95,67%
Libris
Speech
Multilayer Perceptron
100%
98,63%
Linear Discriminant Analysis
93,15%
93,84%
Table 3-Dataset results
We can conclude that both approaches have good
performance, and data augmentation is unnecessary in a system
with a minimal number of users and speeches per user.
Regarding the experiments performed with the TIMIT dataset,
we can conclude that both algorithms scale well with the
increase of users and the use of data augmentation positively
influences the performance of neural networks. Finally, and to
analyse the relation of the number of speeches per user, a final
test of the LibrisSpeech dataset is carried out. This has a total
of 156 users; however, each user has an average of 77 speeches.
It is also necessary to be aware that each speech, on average,
has approximately twice the work of the speeches of the other
datasets. With these results, we observed that data
augmentation has a negative performance on the LDA and a
positive one on the neural network. This indicates that the
increase in the amount of data per user influences the LDA
performance negatively and the neural network positively.
To present the results concerning the system of intentions,
a use case based on the creation of a service was created. To
create this service in the network, it was necessary to create a
topology with 7 hosts and 3 switches, represented in Figure 7.
Figure 7-Test environment
Three switches were created, where for each one three hosts
were associated for SW1 and SW2 and one host for SW3, which
will represent the access to the Internet. Once created, there are
no active services on the network. To prove this, a test is
performed, with the help of the iperf tool, between any hosts to
prove that there are no active services. The Figure 8 shows a
test using the iperf tool between host 1 and host 6.
Figure 8-Connection test between Host 1 and 6
Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on March 30,2023 at 21:41:27 UTC from IEEE Xplore. Restrictions apply.
Once the network is created, we start the use case about
creating a service. The creation of a service depends on 4
parameters: the name of the service, who accesses the service,
if the service needs an internet connection and finally, if there
is the need to limit the bandwidth of the service. To understand
how the user communicates these actions to the virtual
assistant, Mycroft, the following dialogue follows:
• User: “Hey Mycroft, create a service please.”
• Mycroft: “Sure, what is the name of the service?”
• User: “Call it s1.”
• Mycroft: “Ok, please name the machines that will have
access to the service.”
• User: “Provides access to user 1 and 6.”
• Mycroft: “Regarding internet access, does the service
need internet access?”
• User: “Yes please.”
• Mycroft: “One last question, do you want to define the
performance of the service?”
• User: “Yes, 4000 Mbits\sec.
• Mycroft: “Thanks for the information, wait a bit while I
implement the service”
After the user communicates this information to the virtual
assistant and being given the permissions, the state machine
presented starts in order to validate the intent, analyse the
conflicts, compile the intent and install it in the network. For
the example presented, the validation phase will analyse if all
the information for the creation of a service exists. For that, it
checks if all the mandatory fields to create a service are filled
in and with valid information.
Proceeding to the conflict analysis phase, the system checks
if any information of this new service conflicts with any already
implemented. To do so, it checks if the service name already
exists and if the hosts involved in the service are already active
in other services. If any of these cases occur, the user is alerted.
Once validated and without conflicts, the request goes to the
build phase. In this phase, there are several requests made to
both, the Testbed Server and the database, in order to convert
into rules, the policies coming from the user. If any of the
requests fails, the compilation phase fails, and there is the need
to alert the user.
When we get the rules compiled, we move on to the
installation phase. In this phase, the rules are sent to Testbed
Server, which in turn communicates with ODL to install them.
After they are installed, they are stored in the database, and the
user is notified that the application has been installed.
To test whether the service has been created between users
1 and 6, with Internet access and bandwidth limit set, we again
use the iperf tool, in the same context mentioned above:
Figure 9-Connection test between Host 1 and 6(after service created)
After the service created, the iperf tool shows that it is
possible to communicate between user 1 and 6, and these with
the internet, with performance limited to 4000 Mbits/sec.
ACKNOWLEDGMENT
The authors would like to acknowledge the support and
funding from the 6G-Brains project.
REFERENCES
[1] M. Narouei and H. Takabi, “Automatic Top-Down Role Engineering
Framework Using Natural Language Processing Techniques”, in 9th
Workshop on Information Security Theory and Practice (WISTP), 2015,
Heraklion, Crete, Greece.
[2] A. Liu et al, “Efficient Access Control Permission Decision Engine Based
on Machine Learning”. Security and Communications Networks, 2021
[3] F. Deng and L.-Y. Zhang, “Elimination of policy conflict to improve the
PDP evaluation performance,” Journal of Network and Computer
Applications, vol. 80, pp. 45–57, 2017.
[4] S. Marouf, M. Shehab, A. Squicciarini, and S. Sundareswaran, “Adaptive
reordering and clustering-based framework for efficient XACML policy
evaluation,” IEEE Transactions on Services Computing, vol. 4, no. 4, pp.
300–313, 2011.
[5] S. P. Ros and M. Lischka, “Graph-based XACML evaluation,” in
Proceedings of the 17th ACM Symposium on Access Control Models and
Technologies, pp. 83–92, Newark, NJ, USA, 2012.
[6] M. Alohaly, H. Takabi and E. Blanco, “Automated extraction of attributes
from natural language attribute-based access control (ABAC) Policies”.
Cybersecur 2, 2 (2019)
[7] A. Rafiq, A. Mehmood, W. Song, Intent-based slicing between containers
in sdn overlay network, J. Commun. 15 (2020)
[8] F. Paganelli, F. Paradiso, M. Gherardelli, G. Galletti, Network service
description model for vnf orchestration leveraging intent-based sdn
interfaces, in: 2017 IEEE Conference on Network Softwarization
(NetSoft), 2017
[9] W. Cerroni et al, Intent-based management and orchestration of
heterogeneous open-flow/iot sdn domains, in: 2017 IEEE Conference on
Network Softwarization (NetSoft), 2017
[10] A. Rafiq et ak, Intent-based end-to-end network service orchestration
system for multi-platforms, Sustainability 12 (2020).
[11] K. Abbas et al, Slicing the core network and radio access network
domains through intent-based networking for 5g networks, Electronics 9
(2020).
[12] R. A. Addad et al, Benchmarking the onos intent interfaces to ease 5g
service management, in: 2018 IEEE Global Communications Conference
(GLOBECOM), 2018
[13] Y. Tsuzaki, Y. Okabe, Reactive configuration updating for intent-based
networking, in: 2017 International Conference on Information
Networking (ICOIN), 2017
[14] A. S. Jacobs, R. J. Pfitscher, R. A. Ferreira, L. Z. Granville, Refining
network intents for self-driving networks, in: Proceedings of the
Afternoon Workshop on Self-Driving Networks, SelfDN 2018,
Association for Computing Machinery, New York, NY, USA, 2018
[15] F. Aklamanu et al, Intent-based real-time 5g cloud service provisioning,
in: 2018 IEEE Globecom Workshops
[16] P. Loizou, “NOIZEUS: A noisy speech corpus for evaluation of speech
enhancement algorithm.”
[17] W. M. F. J. G. F. D. S. P. N. L. D. V. Z. John S. Garofolo Lori F. Lamel,
“TIMIT Acoustic-Phonetic Continuous Speech Corpus.” [Online].
Available: https://catalog.ldc.upenn.edu/LDC93S1
[18] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An
ASR corpus based on public domain audio books,” ICASSP, IEEE
International Conference on Acoustics, Speech and Signal Processing -
Proceedings, vol. 2015-Augus, pp. 5206–5210, 2015, doi:
10.1109/ICASSP.2015.7178964.
Authorized licensed use limited to: b-on: UNIVERSIDADE DE AVEIRO. Downloaded on March 30,2023 at 21:41:27 UTC from IEEE Xplore. Restrictions apply.