In order to cope with the frequent challenges of network security issues, a method of applying artificial intelligence technology to computer network security communication is proposed. First, within the framework of computer network communication, an intelligent protocol reverse analysis method is proposed. By converting the protocol into an image and establishing a convolutional neural network model, artificial intelligence technology is used to map the data to the protocol result. Finally, use the model to test the test data to adjust the model parameters and optimize the model as much as possible. The experimental results show that compared with the test model, the results obtained after training with the deep convolutional neural network model in this paper have increased the accuracy by 2.4%, reduced the loss by 38.2%, and reduced the running time by 42 times. The correctness and superiority of the algorithm and model are verified.
Application of Artificial Intelligence Technology in Computer
Network Security Communication
Fulin Li
Guangdong University of Science and Technology, Dongguan, Guangdong 523000, China
Received 19 May 2022; Revised 22 June 2022; Accepted 3 July 2022; Published 21 July 2022
In order to cope with the frequent challenges of network security issues, a method of applying artificial intelligence technology to
computer network security communication is proposed. First, within the framework of computer network communication, an
intelligent protocol reverse analysis method is proposed. By converting the protocol into an image and establishing a con-
volutional neural network model, artificial intelligence technology is used to map the data to the protocol result. Finally, use the
model to test the test data to adjust the model parameters and optimize the model as much as possible. e experimental results
show that compared with the test model, the results obtained after training with the deep convolutional neural network model in
this paper have increased the accuracy by 2.4%, reduced the loss by 38.2%, and reduced the running time by 42 times. e
correctness and superiority of the algorithm and model are verified.
1. Introduction
With the development of 5G, 6G technology has also begun
to be studied. e Internet has spread all over the world and
has become a part of contemporary life. As one of the future
development directions, various IoT devices such as smart
homes are developing even faster. Communication between
different IoT devices [1], collaborative processing, and in-
formation transmission are all realized by sending data
packets on the network.
In recent years, the frequency of botnets, darknets, illegal
transactions, and network intrusions has gradually in-
creased. As a bridge of communication between these
means, the analysis of protocols can help to seize the life-
blood of network security and ensure network security.
Network protocols can be divided into two categories
according to their protocol format, process openness, and
other conditions: public protocols and nonpublic protocols.
Public protocols refer to those protocols that disclose the
protocol format and content, and are generally widely used
by people. For example, common network protocols such as
TCP, UDP, DNS, and SMTP. e nonpublic protocol refers
to the format set for some needs, which is usually a unique
and untouched protocol type, so it is also often referred to as
a private network protocol and an unknown protocol for-
mat. However, according to the current research, the tra-
ditional protocol reverse analysis method has low processing
efficiency for the obtained binary bit stream data set. e
method is relatively simple and has certain limitations,
which cannot meet the needs of secure communication in
today’s network systems. In addition, common protocol
reverse analysis tools can basically only parse common
protocol types. For that kind of unknown and unrecognized
data packets, due to the lack of corresponding prior
knowledge, are very difficult to analyze.
Although the reverse analysis technology of the known
protocol format [2] already exists, the reverse analysis of the
unknown protocol format, the related work is still less, or the
limitations are relatively large. erefore, the main research
of this paper is to apply artificial intelligence technology to
unknown network protocols for feature extraction and then
perform intelligent reverse analysis. Figure 1 lists the basic
applications of artificial intelligence technology in the field
of computer network information security.
2. Literature Review
Netzob is a semiautomatic method proposed by Wang et al.
to automate some of the reasoning process of the protocol
structure. Netzob focuses on automating the reasoning
process and does not involve the work of experts. A detailed
lexical model and method are designed for this purpose.
Netzob uses an unweighted method of arithmetic mean for
group method processing. A cluster message is defined as a
symbol. A symbol refers to a group of messages that have the
same format and role from the perspective of the protocol
[3]. Alireza et al. used the Needleman Wunsch algorithm for
each symbol in the network to achieve ordering of common
strings. Generic strings are defined as static fields and al-
ternative fields for the rest of the message. A field refers to a
set of tokens that have a common meaning from a protocol
perspective. A symbol consists of several fields, each of
which can accept one or more values [4].
AutoReEngine is a method proposed by Dinh et al.
AutoReEngine receives network traffic of a single protocol as
input. AutoReEngine mainly includes four steps: data pre-
processing, protocol keyword extraction, message format
extraction, and state machine inference. In the data pre-
processing step, the input traffic is divided into a flow and
the packets in the flow are reassembled into messages.
Protocol keyword extraction is mainly carried out in two
steps [5]. Liu and Yangjun proposed that in the first fre-
quency string extraction step, the Apriori algorithm be used
to input and extract message sequences from field format
candidate keywords. At this time, the length-1 item in the
Apriori algorithm consists of 1 byte, the transaction consists
of each message sequence, and the support units include the
session support rate (Rssr) and the site-specific session set
support rate (Rset) [6]. Bistron and Piotrowski report that
Rssr represents the proportion of candidate sequences that
encompass the entire stream, and Rset represents the pro-
portion of candidate sequences that encompass the entire
site-specific session. A site-specific session refers to a group
of streams with the same server. In other words, for item
groups and candidate item groups that appear gradually
from length-1 to length-K, determine Rsr and Rset, where
frequently occurring item groups are not extracted
according to the default Apriori algorithm. Two terms that
satisfy both sets of threshold session support rate (Tssr) and
threshold point specific session sets (Tsets) are simulta-
neously determined. Byte sequences containing the final set
of all frequently extracted items are extracted and enclosing
strings are determined for these byte sequences [7].
FieldHunter is a method proposed by Mathew, which
receives network traffic of a single protocol as input-
FieldHunter first receives network flow as input and divides
the network flow into network messages. FieldHunter di-
vides the unit of network message into TCP’s PUSH flag and
UDP’s one packet. e syntax inference step first checks
whether it is a text-based or binary-based protocol and
performs the tokenization of the message differently in the
message tokenization module. A key step of FieldHunter is
semantic reasoning [8]. Misra heuristically finds fields
corresponding to predefined meaning types in the semantic
reasoning step, where six predefined meanings are used:
message type, message length, host identifier, session
identifier, transaction identifiers, and accumulators [9]. And
Vollertsen et al. believe that the main way to judge whether a
field corresponds to each type of meaning is to use com-
pletely different concepts for different field types in vertical
analysis; that is, each field has statistical characteristics in
different traces. For example, to derive fields corresponding
to host identifiers, the system provides a field that always
contains the corresponding unique value for each source IP
address for different traces [10].
In recent years, protocol reverse engineering has
achieved fruitful results in various fields. Especially in the
field of network security, the emergence of automatic
protocol reverse technology has brought dawn to network
analysis. By studying network protocols, Dou et al. took
reverse engineering as the entry point of protocol analysis,
from the perspective of traffic syntax analysis and instruction
timing analysis as protocol reverse analysis, but due to the
wide research area, the research depth of the protocol is
insufficient [11]. In the paper, Iwendi et al. proposed a re-
verse protocol analysis technology based on network traffic.
By analyzing the characteristics of traffic syntax and in-
struction execution timing, the state machine analysis of the
protocol was carried out, but the two lacked systematic
analysis of the protocol[12]. is paper is different from the
above. is paper mainly studies from the aspect of
grammar, mainly conducts a reverse analysis of the feature
information of the protocol grammar, and starts from
Artificial intelligence
security check
Risk assessment
of sensitive
Data desensitization
Sensitive data
Data breach
assisted judgment
Machine learning / Knowledge map
Cognitive computing / Semantic computing
Data mining
Figure 1: Application of artificial intelligence technology in the field of computer network information security.
different angles and different algorithms to verify each other,
so as to systematically analyze the protocol grammar
3. Research Method
3.1. Feature Extraction Algorithm Based on Neural Network.
A convolutional neural network (CNN) is a deep learning
architecture that works in a similar way to how the human
eye sees things and then feeds back. ey have great potential
for applications in image classification, natural language
processing, image caption generators, etc. In the past period,
CNN was unable to solve complex problems due to lack of
computing power [13]. But with the advent of graphics
processing units (GPUs) and their use in machine learning,
CNNs have re-emerged and surpassed other architectures in
computer vision tasks. CNN has attracted attention in many
fields, and medical diagnosis is no exception. Image clas-
sification plays a key role in computer vision. It includes
preprocessing image data, image segmentation, extracting
key features, and classifying images into corresponding
classes. Using CNN to classify images effectively and ac-
curately, this technology can be applied to medical diagnosis,
face recognition, security and other fields [14].
Since the convolutional neural network has a better effect
on image processing, and the convolutional operation is
required when using the convolutional neural network, the
convolutional layer can only identify the image data of the
matrix type. erefore, it is necessary to convert the input
data to an image. On the basis of data preprocessing, the
protocol data is put together every 8 bits and converted into
image data between 0–255. Each protocol will generate 40
image data between 0–255 [15]. A one-stage convolution
Piotrowskial network consists of a convolutional layer and a
max-pooling layer. e convolution kernel of the convo-
lution layer is kernel_size, which includes the number of
filters and strides. e input input_ranges of the first con-
volutional layer, a pool_size after the max-pooling layer.
When the input data set of the neural network is small, it is
easy to overfit, which makes the model fall into the local
optimal solution and reduces the training effect of the model.
is article uses the dropout function to prevent this from
happening [16]. In order to make the model training faster
and better solve complex function problems, the ReLU
activation function is used here for processing. e ReLU
activation function is shown in the following formula:
ReLU(x) x, if(x>0)
􏼨 􏼩.(1)
e advantages of the ReLU activation function are: (1)
when backpropagating, the gradient disappearance can be
avoided. (2) Due to the particularity of the ReLU function,
the output of the input on the left side of the Xaxis is 0, so the
effect of some neurons disappears, thereby reducing the
number of parameters in the network and alleviating the
problem of overfitting. (3) Compared with the sigmoid
activation function and the tanh activation function, the
derivation is simple.
e sigmod function is an exponential function, which
requires derivation during backpropagation, which is dif-
ficult to calculate. Using the ReLU function will cost less. e
second section of the convolutional neural network is similar
to the first section, and the size of the convolution kernel is
still 3 ×3, but the number of convolution kernels here has
been increased to 128. en, through the regularization
method of dropout, some redundant information is ran-
domly deleted to prevent the model from overfitting, thereby
improving the generalization ability of the model [17]. e
result is then fed into the flatten layer. e flatten layer is
used to “flatten” the input data, that is, to map the multi-
dimensional input to one dimension, while the fully con-
nected layer, the function of the fully connected layer, is to
use a series of functions to calculate all the feature-extracted
data sets and map each dataset to the corresponding label
classification, so that the expected results are as close as
possible to the actual results. e fully connected layer plays
a classification role in the entire neural network layer. It just
performs a matrix multiplication, which is equivalent to
spatial transformation of the features and statistical ex-
traction and integration of the previous information [18].
en use the activation function to perform nonlinear
mapping so that the data of this class corresponds to the
result one-to-one. It can also change the dimensions without
pressing, and can turn high-dimensional information into
low-dimensional information, and at the same time, it can
retain useful information. For the last layer of full con-
nection, it is the explicit expression of the classification
category. e fully connected layer consists of two parts.
First, the data of the upper layer is flattened (Flatten) and
then input into the fully connected network. e fully
connected network has two layers. e first layer has 128
nodes, and the last layer has 8 nodes. e first layer of the
fully connected network has 128 nodes, and the activation
function is still the ReLU function. e last layer has 8 nodes,
and the activation function is the Softmax function [19].
3.2. Model Training and Prediction. e process of model
training using a neural network based on artificial intelli-
gence technology is shown in Figure 2.
e cross-entropy loss function is used as the loss
function for model training, and the stochastic gradient
descent method is used to optimize the model. e initial
learning rate is set to 0.1, and the indicator to measure the
model is selected as accuracy. e amount of data selected
during training is 64, and the loop is 100 times [20]. Use
tensorboard as a callback function. Cross-entropy loss
function e cross-entropy loss function is to reflect the
effect of model training by calculating the difference between
the actual output and the expected output of the model, and
by continuously adjusting parameters and calculations, the
value of the loss function is reduced, so that the actual value
is closer to the expected value. e cross-entropy loss
function is shown in the following formula[21]:
C 1
[yln a+(1y)ln(1a)].(2)
In the formula: yis the expected output of the model, ais
the actual output of the model, nis the number of categories
of the output, and xis the input of the model. e neural
network algorithm is mainly used for classification and
identification, and each piece of data has one and only one
category. e general activation function is mainly used for
binary classification, like the sigmoid function. e Softmax
function is an extension of the Sigmoid function, which can
do multiple classifications and is not limited by the number
of categories. e sigmoid function is defined by the fol-
lowing formula[22]:
S(t) 1
e image of the sigmoid function is similar to softmax,
and it also maps the input data to (0, 1). In addition, the
sigmod function is monotonically increasing, and the re-
ciprocal form is very simple, which is a more suitable
function. However, the sigmoid function can only do two
classifications, and softmax is an extension of sigmoid. It
maps the k-dimensional input variable xto a probability-like
interval, and then selects the largest subscript according to
the output probability. e corresponding label is the most
data category. e formula of the softmax algorithm is
shown in the following formula[23]:
Because Softmax is an exponential function, when the
input value is large, the value will increase exponentially, and
when the input is negative, it will be greatly reduced, and the
effect of model classification will be improved when the
degree of discrimination is increased. Softmax is a contin-
uously differentiable function, which can be better applied in
the gradient descent algorithm [24].
4. Result Analysis
4.1. Experiment Environment. In this article, set
nbytes 320, so nimage nbytes/8 40. In this paper, 8
protocols are selected for identification, so N8. e test
dataset Dcontains 8 kinds of labels, corresponding to the
ARP-like protocol, DNS-like protocol, HTTP-like protocol,
ICMP-like protocol, OICQ-like protocol, SSDP-like pro-
tocol, tcp-like protocol, and udp-like protocol. In the con-
volutional neural network module, set kernel_size 3,
filters1 64, strides 1, input_ranges 5×8×240000 and
pool_size 2. e dropout regularization method is neces-
sary, let dropout 0 : 25. e number of second convolu-
tional neural network filters is filters2 128. e number of
nodes in the first layer of the fully connected network is
node1 128 and the number of nodes in the last layer is
node2 8. e learning rate of the resulting module is
learn_rate 0 : 1, and the number of training epochs 100.
e total amount of data used in this paper is shown in
Table 1. Put all the protocols together to get the train data set,
randomly shuffle the order of the train data set, and then take
the first 78,000 shuffled sequences for training, and then use
the 2000 for testing [25].
4.2. Experiment Result. After analyzing and training the
protocol and testing 1029 unknown protocols, we compared
the three aspects of accuracy, loss, and running time. e
experimental results are shown in Figure 3–5 below. It can be
seen that the recognition effect of the convolutional neural
network method for unknown protocols is very good, and
the recognition rate is above 99%.
e analysis is as follows: During the experiment, the
training set adopts the CNN deep neural network algorithm,
and the test set adopts the transfer learning algorithm
(DNN). e experimental results including the comparison
of the training set are shown in Figures 3–5 above. It can be
seen from the figure that the performance of CNN and DNN
is quite different. e accuracy of CNN is about 2.4% higher
than that of DNN, while the loss is reduced. 38.2%, and the
running time is reduced by 42 times. e accuracy of transfer
learning is obviously not as good as that of CNN in the early
stages, and it is not as stable as CNN in the later tests. is is
because when using the convolutional neural network in this
paper, it is hoped that the model should be as close as
possible to the distribution of the training data, the predicted
data, and the distribution of the real data, so the cross-
entropy loss function is often used to calculate the two-class
loss function.
When using a neural network for protocol syntax
analysis, a large-scale training set is usually required. If batch
gradient descent is used, the amount of computation will be
very large and require a lot of resources. In this case, the
stochastic gradient descent method is used instead of batch
gradient descent. e stochastic gradient descent algorithm
first needs to randomly select a group from the sample data
for training, sort it according to the loss degree of the output,
and then extract a group. It continues to operate in the above
method until it drops to a certain threshold. erefore,
during training, it is possible to obtain a satisfactory model
without training all the data. Using the CNN algorithm can
quickly analyze the results when the sample size is large. And
the time complexity of CNN is basically stable at O(knp),
where kis the number of iterations, and pis the average
number of nonzero features of each sample. Using the
stochastic gradient descent method, although the accuracy
will decrease and there may be many detours, the overall
trend is towards the minimum loss value, which will save a
lot of time and make the algorithm faster. When predicting
the result, the CNN algorithm puts the remaining 2000
pieces of data into the test set to test and outputs its label and
accuracy. When converting the label, since the similarity is
stored in the model prediction, and the highest similarity can
be identified as the label of the protocol, so this chapter only
needs to find the position with the highest similarity and find
the protocol type it represents. It can be seen that the CNN
Image feature
Image feature
Figure 2: Neural network feature extraction process.
deep neural network algorithm can quickly and efficiently
identify unknown protocols and then output the predicted
protocol type and similarity. In the comparison experiment
with DNN, it was found that it could achieve significantly
superior performance indicators.
5. Conclusion
is paper mainly does some research work on the bitstream
protocol. Including the intelligent reverse analysis method of
the bitstream protocol and the feature extraction method of
the bitstream protocol, on one hand, the research work is to
convert the data protocol frame into an image, and then use
the deep neural network algorithm in artificial intelligence
technology to train the image data frame and pass the
training. A good model identifies the protocol type adopted
by the unknown protocol frame so as to extract the char-
acteristic string in the network protocol frame to ensure the
security of computer network communication.
is paper takes the bitstream protocol data frame as the
research object and multiprotocol identification as the goal
and focuses on network communication security under the
support of artificial intelligence technology. However, due to
the limitations of the experimental environment and con-
ditions, the experimental data set in this paper is mainly
obtained in real time through the Wireshark tool. In the
follow-up research, the previous research can also be
Table 1: Protocol dataset.
Protocol type Total number of data frames e total size of the data frame(KB)
ARP 10000 880
DNS 10000 854
HTTP 10000 867
ICMP 10000 856
OICQ 10000 848
TCP 10000 865
UDP 10000 855
Train 80000 7096
20 40 60 80 1000
Figure 3: Accuracy comparison chart of the training set and test
20 40 60 80 1000
Figure 4: Comparison chart of training set and test set loss.
Operation time (s)
10000 20000 30000 40000 500000
File size (KB)
Figure 5: Comparison chart of the running time of a training set
and a test set.
improved and deepened from the following aspects: 1. is
paper focuses on the feature mining and automatic iden-
tification of the bitstream protocol data; that is, the analysis
of the syntax of the protocol. e next step can be from the
protocol. e semantics and timing directions provide a
more comprehensive analysis of the bitstream protocol data.
2. e system designed in this paper is a protocol identi-
fication system based on the B/S architecture, and the cross-
platform compatibility is relatively poor. In the future, a C/S
architecture protocol identification system needs to be
studied to improve the compatibility of the platform.
