Available via license: CC BY 3.0
Content may be subject to copyright.
Journal of Physics: Conference Series
PAPER • OPEN ACCESS
Log filtering method based on user behaviors
To cite this article: Nan Wu et al 2022 J. Phys.: Conf. Ser. 2253 012014
View the article online for updates and enhancements.
You may also like
Research on Web User Behavior
Compliance Detection Method Based on
Clustering Data Analysis Technology
Cheng Qin
-
Design of Library Mobile User Behavior
Analysis model for Personalized
Information Service
Miaoji Tang
-
Computational challenges and
opportunities for a bi-directional artificial
retina
Nishal P Shah and E. J. Chichilnisky
-
This content was downloaded from IP address 205.237.94.154 on 11/12/2022 at 18:15
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd
EECT-2022
Journal of Physics: Conference Series 2253 (2022) 012014
IOP Publishing
doi:10.1088/1742-6596/2253/1/012014
1
Log filtering method based on user behaviors
Nan Wu, Xueming Tang and Ying Pan*
School of computer and information engineering, Nanning Normal University,
Nanning, Guangxi, People’s Republic of China
* Corresponding author’s e-mail: panying@nnnu.edu.cn
Abstract. In the big data environment, various websites on the Internet have generated more and
more user behaviors. Designing a universal log filtering method based on user behaviors is the
current research trend. However, the current log filtering technology has disadvantages such as
low filtering accuracy and low efficiency. In this paper, we propose a log filtering method based
on user behaviors. First, divide user behaviors into multiple sub-behaviors and assign
corresponding weights. Obtain and store log information of user behaviors through distributed
log collection tools, and the log information of corresponding sub-behaviors below the weight
threshold is filtered. Then, the log information of the retained sub-behaviors is processed in
parallel through the utility function. The utility function establishes the mapping relationship
between user interest degree and sub-behavior indicators. The corresponding log information of
the sub-behaviors below the user interest degree threshold is deleted, and the log information of
the user's preferred sub-behaviors is retained, forming an optimized data source for
recommendation results, and stored in the data cluster. This method can perform secondary
filtering of the massive log information, respond to users' current requirements and interesting
information promptly, improving processing efficiency.
Keywords. log filtering; user behaviors; information query; information recommendation;
distributed environment
1. Introduction
With the rapid development of the Internet, users generate massive amounts of log information while
using the Internet. When faced with massive amounts of Internet information, it is difficult for users to
obtain the information they are interested in, resulting in information overload problems [1,2]. Therefore,
various recommendation methods have become research hotspots, enabling user groups to obtain real-
time and effective information that they are interested in (such as microblog recommendation, product
recommendation, movie recommendation, etc.).
Log filtering is an essential part of the recommendation system. In the big data environment, various
websites on the Internet produce more and more types of user behaviors. Among them, user behaviors
refer to a user's browsing behaviors while using the network, and log information is the information
recorded in the browsing behaviors. Designing a universal log filtering method based on user behaviors
is the current research trend [3,4].
Al-Duwairi et al. [5] proposed a LogDos method, which can filter GET-based message log records
and remove data packets from malicious hosts. Wei et al. [6] used unsupervised multi-autoencoders to
analyze system log files, filter abnormal data in log records, detect threatened data. Vidgof et al. [7]
developed and evaluated an interactive log-delta analysis technology in which analysts can interactively
define the filtering range for log filtering. This method can explore logs and manually separate typical
behaviors from atypical behaviors. Generally, the load data set is unstable, and its reliability evaluation
is very necessary for the preprocessing of the data filtering method. Therefore, Cao et al. [8] proposed
EECT-2022
Journal of Physics: Conference Series 2253 (2022) 012014
IOP Publishing
doi:10.1088/1742-6596/2253/1/012014
2
a novel statistical data filtering method that considers the reliability of the data set by analyzing a wide
range of predefined confidence levels.
However, the current log filtering technology has many shortcomings, such as missing data
(incomplete data, lack of ID, time, product ID, etc.). In addition, only the data containing noise and
missing values are filtered, and different recommendation systems use different filtering methods, which
cannot achieve universality [9,10].
To solve the above problems, this paper provides a log filtering method based on user behaviors,
which can perform secondary filtering of the massive log information, respond to users' current
requirements and interesting information promptly, improving processing efficiency. In addition, this
method is easy to expand and has some fault tolerance.
2. Implementation of log filtering based on user behaviors
2.1. Basic ideas
The basic idea of the filtering method in the paper is shown in figure 1.
User
Behaviors
Sub-behavior 1
Sub-behavior 2
Sub-behavior 3
Sub-behavior 4
Log Information 1
Log Information 2
Log Information 3
Log Information 1
Log Information 2
Log Information 3
Log Information 1
Log Information 2
Log Information 3
Behavior
Function
Utility
Function 1
Utility
Function 2
Utility
Function 3
The weight of this log information is below the
threshold, and this log information is filtered out.
The user interest degree of this log information is below
the threshold, and this log information is filtered out.
The user interest degree of this log information is below
the threshold, and this log information is filtered out.
The user interest degree of this log information is below
the threshold, and this log information is filtered out.
The user interest degree of this log information is below
the threshold, and this log information is filtered out.
Figure 1. The basic idea of log filtering based on user behaviors.
Users generate a huge amount of user behaviors in each business system (e.g., client applications or
pages for online shopping, microblog browsing, news recommendations, etc.), and page developers pre-
divide user behaviors into multiple sub-behaviors and assign corresponding weights in the back-end for
different business systems. For example, in the business system of online shopping, user behaviors are
divided into various sub-behaviors such as browsing behaviors, clicking behaviors, purchasing
behaviors, and so on, and in the business system of microblog browsing, user behaviors are divided into
various sub-behaviors such as browsing behaviors, clicking behaviors, searching behaviors, etc. An
example of a business system of online shopping is shown below as a reference. When a user is shopping
online, the page developer will first enumerate a wide range of sub-behaviors in advance for most
consumers' shopping habits, and assign weights to multiple sub-behaviors based on the user's purchase
probability. Then the system accesses the log tables of the database through an existing distributed log
collection tool, parses the log tasks, and extracts the log information of the user. After acquiring and
storing the log information of user behaviors, the system saves it to the data cluster, which can carry a
huge amount of log information of user behaviors and provides reliable information transmission for the
EECT-2022
Journal of Physics: Conference Series 2253 (2022) 012014
IOP Publishing
doi:10.1088/1742-6596/2253/1/012014
3
subsequent log filtering stage. Finally, the log information of the corresponding sub-behaviors below
the weight threshold is filtered out, i.e., the log information of some sub-behaviors with relatively no
reference value is removed. In this way, the first filtering of the behavior log is achieved.
The log information of the reserved sub-behaviors is processed separately in parallel by the utility
function, i.e., each sub-behavior is processed separately. The utility function with targeting is established,
and the part of log information of each sub-behavior that does not have reference value is filtered out
again. Where, the sub-behaviors include attribute information and indicators, the indicators include
multiple sub-indicators with parameters, and the numerical magnitude of the sub-indicators is of
comparative significance. The utility function establishes a mapping relationship between the user
interest degree and indicators of at least one sub-behavior, calculates the user interest degree separately
for different types of utility functions, and presets the interest degree threshold value separately. The
corresponding part of the log information of the sub-behaviors below the interest threshold is filtered
out, and the remaining log information that is not below the interest degree threshold is the user preferred
sub-behaviors. Finally, log information of user preferred sub-behaviors is retained to form an optimized
data source for recommendation results, which is stored in the data cluster as a data source with wide
applicability for each recommendation end. In this way, the second filtering of the behavior log is
implemented.
2.2. The creation of behavior functions
In the first filtering, create the behavior function of user behaviors, define multiple sub-behaviors, such
as browsing behaviors (a single click on the viewed page will record multiple browsing data, such as
user information, time, address, product ID, current mouse dwell time, current page scroll count, etc.),
clicking behaviors (the sub-behaviors of clicking behaviors are clicking on search products or
recommending products in the list, recording user information, time, address, clicking product ID, etc.),
buying behaviors(the sub-behaviors of buying behaviors are adding products to the cart for payment or
not, recording user information, product ID, payment time, order time, address, etc.), comparing
behaviors(adding multiple products to the comparison column to compare each parameter), etc. In
particular, there is a certain overlap of each behavior, such as the browsing process will have clicking
behavior, all will be extracted and considered separately for the two sub-behaviors, and record user
information, product ID, comparison time, address, etc. The weights of the various sub-behaviors are
adjusted to the needs of the user. The behavior function is
Ϝα =f(x1 . x2……xm)=wi
m
ixi (1)
Where wi is the corresponding weight of each sub-behavior of user α, 0< wi <1, x1. x2……xm are the
corresponding m kinds of sub-behaviors of user α. In the business system of online shopping, the weight
of browsing behaviors, clicking behaviors, and purchasing behaviors are higher than the threshold, and
the weight of comparison behaviors is lower than the threshold, so all the log information of comparison
behaviors is filtered out, and the log information of browsing behaviors, clicking behaviors and
purchasing behaviors are retained.
2.3. Calculation of utility functions for different sub-behaviors
Sub-behaviors include user information (user ID, account registration time), user's current page access
time, current page address, and sub-behavior indicators. Sub-behavior indicators for different sub-
behaviors do not include exactly the same items.
2.3.1. Indicators of sub-behaviors with multiple independent parameters. When the indicators of sub-
behaviors are multiple independent parameters, the multiple independent parameters have no relative or
complementary relationships with each other, and all of them have consideration value. For example,
the indicators of browsing sub-behaviors are mouse dwell time, current page scroll count, etc., where
browsing time and current page scroll count are independent parameters. The utility function is
G()=f(y1 . y2……yn)=wi
n
iyi (2)
EECT-2022
Journal of Physics: Conference Series 2253 (2022) 012014
IOP Publishing
doi:10.1088/1742-6596/2253/1/012014
4
The weights of each parameter of the sub-behavior are adjusted and assigned according to the user's
needs, and the user interest degree of the current page of the sub-behavior is obtained by calculation.
Where wi is the corresponding weight of each parameter of sub-behavior β, 0< wi <1, wmouse dwell time is
preset to 0.8 and wcurrent page scroll count is preset to 0.2, i.e., the operation of mouse dwell time is regarded
as the more interesting behavior of users. y1. y2……yn are the corresponding n parameters for sub-
behavior β, for a given page, wmouse dwell time is 5 seconds and wcurrent page scroll count is 1 time. The G(β)
calculated is 4.2, and the page developer uses 4.2 as the interest threshold, i.e., when G(β)≥4.2, the
corresponding log information of the page is retained, and the log information that does not satisfy the
function condition is filtered out.
2.3.2. Indicators of sub-behaviors with two options of “executed” and “unexecuted”. When the
indicators of a sub-behavior are two options of “executed” and “unexecuted”, the two options are either
relative or complementary. For example, when the indicators of the sub-behavior of purchase behavior
are the two options of purchase and unpurchased, the two options are relative. Then the utility function
is
Gβ=0 exectuted
1 unexectuted (3)
Retain the log information corresponding to the sub-behavior corresponding to the option (i.e., user
interest degree is 1 and interest degree threshold is 1) that takes the value 1, i.e., retain the log
information of the sub-behavior that generates order information, or retain the log information of the
sub-behavior of the product for which the user clicked on the search.
2.3.3. Indicators of the searching sub-behaviors. When the sub-behavior is searching behavior, the sub-
behavior of searching is to input keywords to query and record user information, product ID, retrieved
keywords, address, etc. Reading the keywords searched by users, for example, the user enters the search
box with the keyword "movie ticket", and uses the semantic model to get the associated words of the
keywords. The semantic model is an existing technology and contains the semantic extension query
interface, the semantic support system, the inference system, and the ontology system. The semantic
extension query interface is used to analyze user requests, determine the semantics of users and bind to
relevant concepts. The semantic support system provides support for semantic analysis. The inference
system serves for semantic analysis and knowledge processing, and the ontology system is used for
knowledge representation and knowledge processing. The associated words are inferred from the
keywords input by the user through the semantic model to obtain the information of the associated
objects. For example, if a user's historical orders include "movie ticket" and "diaper", the associated
words can be "movie channel", "baby diapers", etc. The indicators of the sub-behaviors are the similarity
between keywords and associated words. When these associated words appear in the same historical
order, the user interest degree of the associated word is defined as 1. When the associated word does not
appear in the historical order, the user interest degree can be calculated by the similarity method. Then
the utility function is
Gβ= 0 unrelevant
x associated
1 identical
(4)
3. Conclusions
The log filtering method in this paper adopts distributed mode to collect log information from various
business systems in the network to obtain log information of user behaviors. The optimization result is
obtained through secondary filtering of self-defined functions. This method can quickly and efficiently
process small batches of data, ensuring the efficiency and practicability of log filtering. At the same
time, the method is easy to expand, and the fault-tolerant recovery mechanism is easy to implement.
EECT-2022
Journal of Physics: Conference Series 2253 (2022) 012014
IOP Publishing
doi:10.1088/1742-6596/2253/1/012014
5
Acknowledgments
This research was supported by the National Natural Science Foundation of China under Grant No.
61862010, Guangxi Collaborative Innovation Center of Multi-source Information Integration and
Intelligent Processing, and Innovation Project of Guangxi Graduate Education No. YCSW2021283.
References
[1] Tang Y, Spektor A, Khatchadourian R and Bagherzadeh M 2021 A Tool for Rejuvenating Feature
Logging Levels via Git Histories and Degree of Interest Preprint arXiv/2112.02758
[2] Wu W, Zhang R and Liu L 2019 A personalized network-based recommendation approach via
distinguishing user's preference International Journal of Modern Physics B 33 1950029
[3] Wan H and Ismail N F 2021 Recommender System for Multiple Databases Based on Web Log
Mining Annals of Emerging Technologies in Computing 5 187-93
[4] Tanaka T, Niibori H, Shiyingxue L I, Nomura S and Tsuda K 2020 Bot Detection Model using
User Agent and User Behavior for Web Log Analysis Procedia Computer Science 176 1621-
25
[5] Al-Duwairi B, Oozkasap O, Uysal A, Kocaogullar C and Yildirim K 2020 LogDos: A Novel
Logging-based DDoS Prevention Mechanism in Path Identifier-Based Information Centric
Networks Computers & Security 99 102071
[6] Y Wei, Chow K P and Yiu S M 2020 Insider Threat Detection Using Multi-autoencoder Filtering
and Unsupervised Learning IFIP Advances in Information and Communication Technology
vol 589 ed Peterson G and Shenoi S (Springer, Cham: New Delhi, India) pp 273-90
[7] Vidgof M, D Djurica, Bala S and Mendling J 2021 Interactive log-delta analysis using multi-
range filtering Software and Systems Modeling 4 1-22
[8] Cao M T, Pham T T, Kuo T C, Bui D M and Nguyen T H 2020 Short-Term Load Forecasting
Enhanced With Statistical Data-Filtering Method 2020. IEEE. Int. Conf. on Power Electronics,
Smart Grid and Renewable Energy (PESGRE2020) (IEEE: Cochin, India) pp 1-8
[9] Jiang M, Zhang Z, J Jiang, Wang Q and Pei Z 2019 A collaborative filtering recommendation
algorithm based on information theory and bi-clustering Neural Computing and Applications
31 8279–87
[10] Feng L, Cai Y, Wei E and Li J 2022 Graph Neural Networks with Global Noise Filtering for
Session-based Recommendation Neurocomputing 472 113-23