ArticlePDF Available

Content-Based Filtering Technique using Clustering Method for Music Recommender Systems

Authors:

Abstract and Figures

The constant advancement of web development trends and technology has resulted in a large number of web systems that are frequently visited on a regular basis. Among the web systems that have been established include systems that allow users to listen to music online without having to download it to their devices. With the increasing popularity of music streaming, music recommender systems are important instruments for increasing digital music consumption. Machine learning (ML) is a form of artificial intelligence that makes the systems think like humans. ML allows a system to learn gradually to improve its accuracy in predicting future outcomes. The objective of this study is to develop a music recommendation system using one of the ML techniques, which is the content-based filtering technique. This study aims to explore on the music recommender system and how it is implemented, to design and develop a music recommender system. Popular algorithms for unsupervised learning, such as the k-means clustering, Euclidean distance, and cosine similarity methods were implemented in this study. These algorithms identify hidden patterns or data groupings without a human's assistance. It is the best option for exploratory data analysis due to its ability to find informational similarities and differences. The system will determine the song feature values based on an analysis of the music user listens to during usage. This allows the algorithm to select similar songs after calculation in the database that would best match the user's interests at any given time. K-means clustering was used to cluster the data according to the similarities of each song, separating them into different groups. Cosine similarity calculated the cosine distance with other data and recommended the one with a shorter distance. Euclidean distance calculated the direct distance between two vectors and recommended the one with a shorter distance. The results were then generated and presented to the user. Based on the findings, all the results produced by each method were accurate and similar.
Steps involved in building music recommender system i. Step 1: Data preparation. Spotify Million Playlist Dataset Challenge was used [30]. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2020. This dataset could be accessed at https://www.recsyschallenge.com/2018/. ii. Step 2: Mutual features calculation. Pearson's correlation coefficient and mutual information between the characteristics and the dependent variable were computed using a visualizer. To choose features with a high correlation or significant mutual information with the dependent variable, the visualisation could be employed. iii. Step 3: Sound features analyzation. The music data were read and analysed based on sound features, such as acoustics, danceability, energy, instrumentals, liveness, and valence. Songs were also categorized into different categories based on their popularity and released year. iv. Step 4: Perform cluster method. This step involved the clustering algorithm process. The three methods applied were k-means clustering, cosine similarity, and Euclidean distance methods. v. Step 5: Build recommendation engine. The recommendation engine based on the learned data was built in this phase. The mean vector would be calculated between the input and existing songs in the datasets. Thereafter, the engine would recommend songs with similar attributes to the user. vi. Step 6 & Step 7: Users could select to recommend based on a list of music or select one song to recommend as an output.
… 
Content may be subject to copyright.
Journal of Advanced Research in Applied Sciences and Engineering Technology 56, Issue 2 (2026) 206-218
206
Journal of Advanced Research in Applied
Sciences and Engineering Technology
Journal homepage:
https://semarakilmu.com.my/journals/index.php/applied_sciences_eng_tech/index
ISSN: 2462-1943
Content-Based Filtering Technique using Clustering Method for Music
Recommender Systems
Foong Kin Hong1, Mohd Arfian Ismail1,2,*, Nur Farahaina Idris1, Ashraf Osman Ibrahim3, Shermina
Jeba4
1
2
3
4
Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, 26600 Pekan, Pahang, Malaysia
Center of Excellence for Artificial Intelligence & Data Science, Universiti Malaysia Pahang Al-Sultan Abdullah, Kampung Melayu Gambang, 26300
Kuantan, Pahang, Malaysia
Creative Advanced Machine Intelligence Research Centre, Faculty of Computing and Informatics, Universiti Malaysia Sabah, 88400 Kota
Kinabalu, Sabah, Malaysia
Department of Computing, Muscat College, Bousher Muscat OM، 112, Oman
Keywords:
Recommender systems; K-means
clustering; Cosine similarity; Euclidean
distance
* Corresponding author.
E-mail address: arfian@ump.edu.my
https://doi.org/10.37934/araset.56.2.206218
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
207
1. Introduction
Over the years, the ever-escalating development of the Internet has changed the lifestyle of
current society in various ways related to communication and lifestyle. The constant advancement
of web development trends and technology has resulted in a large number of web systems that are
frequently visited on a regular basis. Among the web systems that have been established include
systems that allow users to listen to music online without having to download it to their devices. This
technology solves some issues that arise from peer-to-peer software, one of them is the requirement
of a large storage space to download and stream a wide variety of music. This also relates to music
copyright legal issues, whereby large music distribution companies started legal battles against some
peer-to-peer software owners. Despite the operations of some peer-to-peer software nowadays,
these web music services have become a major way of music sharing.
The software or websites that provide music listening services contain large music collections for
the public. The copyright issue for each nation will be handled by these music listening services. They
adapt their musical catalogue according to the copy and reproduction rights of the musical label
associated with each music distribution company or individual artist. Most music service companies
charge for their services, while some offer free access to the musical collection, but not for
reproduction. Many music streaming systems are evolving and have significantly improved over the
years.
The recommender systems aim to estimate and predict the users’ interests or content
preferences and recommend product items related to their preferences [1-3]. Music recommenders
are accessible to users, not only common and popular music, however, new emerging groups, minor
rare music, and some independent label productions are also available. Recommendations will speed
up the user’s search process, leading them to interested or similar contents they would have never
searched for. There are several significant reasons to implement a recommender system for music
listening service providers. One of them is the promotion of certain artists, whose musical works
quality are noteworthy.
Implementing the recommender system in a music listening service provider is significant [4,5].
First and foremost, they will handle the duty of filtering and choosing new music for new listeners. It
is either the users who want to explore new music or simply want to listen to a genre they have never
listened before, the recommender system will play the music based on the users’ selection.
Conversely, for experienced listeners who want to continue exploring more about their favourite
artists or similar artists’ music, the recommender system will also handle their job in suggesting that
music. Briefly, a music recommender system is important to improve the user’s experience while
listening to music.
Despite the fact that additional music recommendation systems have been created and
implemented for many websites, they still need to be perfect. Sometimes, the system still generates
some unsatisfactory suggestions [6,7]. This is due to the users’ preferences for musical works are
influenced by a variety of elements that are not considered sufficient in the current music
recommender system approaches. The music recommender systems are mainly centred on
interaction between the users and items, as well as content-based item descriptors.
One of the problems faced by the music recommender system is the ‘cold start problem’. Most
music recommender system has this problem. There is not enough information in the system linked
to the music when a new user signs in or when a new song is added to the music library. Therefore,
the system could not recommend other relevant musical pieces.
Another problem with the music recommender system is the sparsity problem. Sparsity problems
refer to situations, whereby transactional or feedback data is sparse, number of given ratings is
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
208
significantly less than the expected ratings, which usually happens when there are many users and
items. This situation makes the system have insufficient data to identify and recommend suitable
musical pieces to the users.
The next problem is the continuation of automatic playlist. A playlist is a collection of music songs
that have been prepared and are intended to be listened to by users. In this context, different
approaches have been tried and continuously improving, such as the Markov chain and log-likelihood
method. As a variation of automatic playlist generation, the task of automated playlist continuation
includes the addition of one or more tracks to a playlist in a way that matches the attributes of the
original playlist. This variation helps users listen to a more compelling playlist without extensive
musical familiarity. Concisely, the main challenge of automatic playlist continuation is to correctly
estimate the intended purposes of a specific playlist and recommend music like its properties. This is
difficult because of the wide range of intended purpose and the diversity in these underlying
features. Machine learning (ML) is a form of artificial intelligence that will make the systems think
like humans [8]. ML allows a system to learn gradually to improve its accuracy in predicting future
outcomes. Generally, ML systems have three main parts; a decision-making procedure that bases its
estimate on the information provided by the users, an error function to assess the model's forecast,
and a model optimization procedure to increase the precision of the system. ML is widely
implemented in fraud detection, spam filtering, and recommendation systems.
In this study, ML will be implemented in the music recommender [9,10]. The content-based
filtering technique is chosen in this recommender system. Content-based filtering focus on the
relationship between items, it recommends items based on the similarity with other items. It is based
on the value of sound features of each song, such as danceability, energy, and loudness. Based on
the user’s music preferences, the music recommender system will recommend similar music.
2. Literature Review
The techniques used by the recommender system are described in this section. Content-based
filtering, collaborative filtering, and hybrid filtering are the three methods. Additionally, a comparison
of each technique was included.
2.1 Content-Based Filtering
Content-based filtering is a method that uses item features, such as item keywords and attributes
to recommend similar items to what the users like, based on users’ previous actions and feedbacks
[11,12]. In the example of music recommendations, a recommender system will consider whether a
song belongs to a specific genre, analyse the song on its lyrics, artists, and others before
recommending it to users according to the profiles created.
One of the most straightforward ways in developing a content-based filtering music
recommender system is keyword matching [13-16]. The ideology is to extract essential keywords
from a song description. Based on user’s activity on likes and search history, the system will find other
music with the same keywords, calculate similarities between songs, and suggests musical pieces to
the user. Briefly, content-based recommendation algorithm involves two steps. The first step is to
extract characteristics from the song descriptions to generate an object representation. The next step
is to define similarity among the object representations created, mimicking human understanding in
item-item similarity.
The key idea of content-based filtering is to recommend items with similar attributes. Similarity
can be derived from description of items using the term frequency-inverse document frequency (TF-
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
209
IDF) technique. This method is used to count the number of times each word appears in a document,
weigh its relevance, and generate a score for that item [17-19].
The term frequency of a word in the present document refers to the number of times it appears
to the total number of words in a document. For example, the phrase “music” in the data, “I love
music because it helps me to release stress”. Inverse document frequency (IDF) is the metric of how
important that term is over the whole database [20,21]. It is defined as the total number of
documents to the frequency that occurred containing the word. The lower the number of documents
containing the terms, the higher the value of IDF, indicating that the term is rare. Therefore, a TF-IDF
vector will be calculated with TF and IDF values calculation. Moreover, to use this vector matrix for a
recommendation, similarity of one data to another needs to be calculated. Different metrics can
compute similarity between items, such as cosine similarity, Euclidean distance, and k-means
clustering.
2.2 Collaborative Filtering
Collaborative filtering is the most common and widely used method for generating
recommendations in music streaming services [22]. This algorithm relies on a set of songs that users
preferred in the past to predict which song they would like to listen to. Users’ rating is collected by
using two methods to identify the users’ preferences., Firstly, is the explicit rating, whereby a system
asks users to rate on the recommended songs directly. Secondly, is the implicit rating, whereby the
system considers the duration and frequency of the songs being played to know whether the users
like the songs. These ratings are then translated to binary to generate interaction metrics.
Now that the interaction metrics have been created, next is the part where the system starts to
recommend songs to a particular user. Collaborative filtering consists of two approaches, which are
the user-based and item-based approaches [23]. For a user-based approach, the system will find
users with similar interests and behaviour, considering which songs they frequently listen to and
make recommendations. For the item-based approach, songs that are listened to in the past are
considered and recommendations are made based on that. The main idea of collaborative filtering is
to recommend new songs based on the closeness in the behaviour of similar users [24]. For example,
if user A likes music X, many users also like music X and music Y. Then, music Y will be recommended
to user A. Therefore, collaborative filtering primarily focuses on relationships between items and
users, whereby items similarity among users is determined by the users who rated them. Some ML
algorithms could also be implemented in collaborative filtering, such as k-nearest-neighbours,
clustering, and matrix factorization.
2.3 Hybrid Filtering
Hybrid filtering approach is a mixture of content-based and collaborative filtering in making
recommendations [25-27]. In this type of recommender system, both user-to-item relation and user-
to-user relation are important. The data collection is similar to content-based and collaborative
filtering, whereby it collects data explicitly or implicitly. The data consists of collecting similar
calculations, producing results by using both methods. This hybrid system has a higher suggestion
accuracy because it covers the absent part of other recommender systems [28,29]. For example,
people’s interest is not considered in content-based filtering, however, hybrid filtering considers
people’s interest. When two approaches work together, they explore new paths to significant
underlying. The hybrid system implements both methods, overcoming most of the weaknesses in
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
210
both algorithms, and improving the systems performances. Classification and cluster techniques
could also be included to get a more excellent recommendation.
2.4 Comparative Analysis of Techniques in Music Recommender System
Comparisons between the three techniques of the music recommender system were conducted.
The comparison covers the pros and cons of every approach.
The first advantage of content-based filtering approach is that this model does not need any data
about other users since the recommendations made are based on items specifically for one user.
Once a user has searched on a few items, a content-based filtering system can begin making relevant
recommendations. This makes it ideal for businesses that have a small pool of users to sample. The
next advantage of content-based filtering is that recommendations are highly relevant to the users.
This filtering method is highly tailored to the users’ interests, including recommendations for niche
items. The only disadvantage of content-based filtering is that the model only makes
recommendations based on the users’ interests. In other words, the model has limited ability to
expand on the users’ interests.
The advantage of collaborative filtering is that this model can help users to discover new interests.
The music recommender system may not know the user is interested in a given item, but the model
might still recommend it because similar users are interested in that item. As for the disadvantage’s,
collaborative filtering method has a cold start problem. The system will have difficulty in making
recommendations when the users are new. This is because the operation of collaborative filtering is
based on historical data of site interactions between the users and items. However, new users and
items do not have enough historical data (data sparsity) to make it work. Another disadvantage of
collaborative filtering is that it needs to improve its scalability. As the number of users increases and
the amount of data expands, collaborative algorithms will begin to suffer a decrease in performance
simply due to the sheer increase in data volume.
The advantage of hybrid filtering is that it has a higher performance and accuracy compared to
other recommender systems. This filtering technique combines two or more recommendation
techniques to gain performance with fewer drawbacks. The disadvantage of hybrid filtering is
difficulty in implementation. The combination of different feature selection methods caused an
increase in complexity, and thus make it difficult for implementation.
This study used the content-based filtering technique to make a music recommender system.
Since the recommendations are based on items especially for one person, content-based filtering
does not require any information about other users. This approach only needs to make
recommendations based on songs in the datasets, using song features to calculate similarities
between data.
3. Material and Methods
This section is about implementation of the music recommender system. Three clustering
methods were used, namely the k-means clustering, cosine similarity and Euclidean distance
methods. The music recommender system was developed using the Jupyter Notebook. The code for
the music recommender system could be reached at https://github.com/unpiye/music-
recommender-system. There are seven main steps involve which given by Figure 1. Detail of steps
involved are given as follows:
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
211
Fig. 1. Steps involved in building music recommender system
i. Step 1: Data preparation. Spotify Million Playlist Dataset Challenge was used [30]. The
dataset contains 1,000,000 playlists, including playlist titles and track titles, created by
users on the Spotify platform between January 2010 and October 2020. This dataset could
be accessed at https://www.recsyschallenge.com/2018/.
ii. Step 2: Mutual features calculation. Pearson's correlation coefficient and mutual
information between the characteristics and the dependent variable were computed
using a visualizer. To choose features with a high correlation or significant mutual
information with the dependent variable, the visualisation could be employed.
iii. Step 3: Sound features analyzation. The music data were read and analysed based on
sound features, such as acoustics, danceability, energy, instrumentals, liveness, and
valence. Songs were also categorized into different categories based on their popularity
and released year.
iv. Step 4: Perform cluster method. This step involved the clustering algorithm process. The
three methods applied were k-means clustering, cosine similarity, and Euclidean distance
methods.
v. Step 5: Build recommendation engine. The recommendation engine based on the learned
data was built in this phase. The mean vector would be calculated between the input and
existing songs in the datasets. Thereafter, the engine would recommend songs with
similar attributes to the user.
vi. Step 6 & Step 7: Users could select to recommend based on a list of music or select one
song to recommend as an output.
3.1 K-Means Clustering Algorithm
This section describes Step 4, which is the perform cluster method. The k-means clustering
algorithm can be obtained at K-MeansClustering.ipyn from https://github.com/unpiye/music-
recommender-system. This method is based on the numerical audio attributes of each genre; the
dataset genres are divided into 10 clusters using the simple k-means clustering technique. The
clustering procedure was carried out utilizing the t-distributed stochastic neighbour embedding (t-
SNE) approach. By comparing the distances between nearby or local points, t-SNE assesses how
similar they are. Similar points are those that are close to one another. This similarity distance is then
converted into a probability for each pair of points by the t-SNE. In the high-dimensional space, two
points will have a high probability value if they are close to one another, and vice versa. In this
manner, the likelihood of selecting a group of points is proportional to how similar they are. How
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
212
wide or narrow a space t-SNE captures similarities between points depends on "perplexity". t-SNE
will only use two related points and produce a plot with numerous dispersed clusters if your
perplexity is low (perhaps 2 or less). However, when the perplexity reaches 10, t-SNE will group
together 10 neighbouring points since it views them as being similar, leading to larger clusters of
points. The perplexity value of 30 was applied in this instance.
3.2 Cosine Similarity
This section also describes Step 4 (perform cluster method), which uses the cosine similarity
method. The file that executes this method is CosineSimilarity.ipynb and can be downloaded from
https://github.com/unpiye/music-recommender-system. This method uses a function that finds
similar tracks based on the user’s input. Cosine distance is used to measure the distances between
songs in dataset. Based on the cosine distance between two vectors in an inner product space, this
method calculates how similar they are. It establishes whether two vectors are roughly pointing in
the same direction by measuring the cosine of the angle between them. Afterwards it will give the
consumers song recommendations with a high resemblance.
3.3 Euclidean Distance Method
This section also describes Step 4, which provides an overview in the utilization of Euclidean
distance method in the music recommender system. The file containing the code of Euclidean
distance method is EuclideanDistance.ipynb and can be obtained from
https://github.com/unpiye/music-recommender-system. The Euclidean distance is a measurement
of the separation between two locations along a straight line. In data science and ML, this method
is widely utilized to access the similarity between two data points.
4. Results and Discussions
4.1 K-Means Clustering Algorithm Result
One of the clustering visualization methods, t-SNE was used in this case instead of the principal
component analysis (PCA). For a small dataset, t-SNE tends to handle nonlinear data efficiently, it can
interpret complex polynomial relationships between features comparatively better. In k-means
clustering, there is no fixed number or the best number of clusters to be used, and thus 10 clusters
were utilized. In other words, the dataset was divided into 10 clusters with similar attributes
according to their feature values (i.e., energy, liveness, and loudness).
The perplexity was set at 30, whereby 30 nearest neighbours were calculated separately to
produce denser clusters. In this case, the default number of iterations was used, which was 1000,
whereby the algorithm would redefine data values with the centroid of clusters 100 times before
finalizing the clusters. Figure 2 shows different clustering results after running the codes four
separate times.
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
213
Fig. 2. Clustering Result
Figure 3(a) displays the outcomes based on the input of the song ‘Let Her Go,’ whereas Figure
3(b) displays the outcomes based on the input of the song ‘fOoL fOr YoU.’. As shown in Figure 3, the
recommended songs for each case overlapped with each other (highlighted in green). As ‘fOoL fOr
YoU’ was the first recommended song based on the song input of ‘Let Her Go’, ‘Let Her Go’ was
recommended when the song input changed to ‘fOoL fOr YoU’. The same goes with the other
overlapping songs. This means that the recommendation engine categorized them into the same
cluster, and a recommendation was made to the user from the same cluster.
(a) Recommendation based on ‘Let Her Go’
(b) Recommendation based on ‘fOoL fOr YoU’
Fig. 3. The results of recommendation song title using K-mean clustering algorithm
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
214
4.2 Cosine Similarity Result
Figure 4 shows the recommendation made from cosine distance based on the songs. For Figure
4(a), the song ‘Hey Brother’ by Avicii was used, while Figure 4(b) shows the recommendation based
on Zayn’s song ‘Pillowtalk’. As shown, both songs appeared in each other recommendation list, which
means that their cosine distance to each other was short. Moreover, there were other overlapping
songs (highlighted in green), whereby the recommendation indicated that all highlighted songs were
close to each other, which were measured by using the cosine distance.
(a) Recommendation based on 'Hey Brother’
(b) Recommendation based on ‘Pillowtalk’
Fig. 4. The results of recommendation song title using cosine similarity
4.3 Euclidean Distance Method Result
Figure 5 shows the recommendation using the Euclidean distance method. Figure 5(a) shows
recommendation based on the song ‘Applause’, while Figure 5(b) shows recommendation based on
the song ‘Telephone’. As indicated, both songs appeared in each other recommendation.
Furthermore, there were other overlapped recommended songs. This means that these songs were
close to each other, which were calculated by using the Euclidean distance.
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
215
(a) Recommendation based on ‘Applause’
(b) Recommendation based on ‘Telephone’
Fig. 5. The results of recommendation song title using K-mean clustering algorithm
5. Conclusion
K-means clustering, Euclidean distance and cosine similarity are methods that have been widely
used in building a recommender system. These are well-liked algorithms for unsupervised learning
under ML techniques for clustering and analysing data. These algorithms locate concealed patterns
or data clusters without the aid of a human. In this study, all three algorithms (k-means clustering,
Euclidean distance, and cosine similarity) were explored, and a separate music recommender system
was built based on each algorithm. The music recommender system work process involved mutual
features calculation and sound features analyzation, collecting data of each song, comparing and
contrasting among each other for a clustering process later. The data were then clustered by the
algorithm according to similarities of each song, separating them into different groups. Based on
those clusters, the recommendation engine was built. The system took the user’s input of songs,
searched for cluster that belongs to the song, and recommended songs close to it within the same
cluster. The result was then generated and presented to the user. Based on the findings, all the results
produced by each method were accurate and similar. Finally, it is hope that the methods used in this
study can be utilized in different domains such as communication [8], cloud [31], cyber-security
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
216
[32,33], medical [34,35], data privacy [36,37], internet of things [38-40], education [41-43] and
tourism [44,45].
Acknowledgement
Fundamental Research Grant (RDU) with Vot No. RDU220304 from the Universiti Malaysia Pahang
Al-Sultan Abdullah provided funding for this study.
References
[1] Roy, Deepjyoti, and Mala Dutta. "A systematic review and research perspective on recommender systems." Journal
of Big Data 9, no. 1 (2022): 59. https://doi.org/10.1186/s40537-022-00592-5
[2] Jannach, Dietmar, Ahtsham Manzoor, Wanling Cai, and Li Chen. "A survey on conversational recommender
systems." ACM Computing Surveys (CSUR) 54, no. 5 (2021): 1-36. https://doi.org/10.1145/3453154
[3] Milano, Silvia, Mariarosaria Taddeo, and Luciano Floridi. "Recommender systems and their ethical challenges." Ai
& Society 35 (2020): 957-967. https://doi.org/10.1007/s00146-020-00950-y
[4] Jin, Yucheng, Nava Tintarev, Nyi Nyi Htun, and Katrien Verbert. "Effects of personal characteristics in control-
oriented user interfaces for music recommender systems." User Modeling and User-Adapted Interaction 30, no. 2
(2020): 199-249. https://doi.org/10.1007/s11257-019-09247-2
[5] Dinnissen, Karlijn, and Christine Bauer. "Fairness in music recommender systems: A stakeholder-centered mini
review." Frontiers in big Data 5 (2022): 913608. https://doi.org/10.3389/fdata.2022.913608
[6] Wang, Yifan, Weizhi Ma, Min Zhang, Yiqun Liu, and Shaoping Ma. "A survey on the fairness of recommender
systems." ACM Transactions on Information Systems 41, no. 3 (2023): 1-43. https://doi.org/10.1145/3547333
[7] Chen, Jiawei, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. "Bias and debias in recommender
system: A survey and future directions." ACM Transactions on Information Systems 41, no. 3 (2023): 1-39.
https://doi.org/10.1145/3564284
[8] Majid, Mazlina Binti Abdul, Jasni Binti Mohamad Zain, and Arief Hermawan. "Recognition of Malaysian sign
language using skeleton data with neural network." In 2015 International Conference on Science in Information
Technology (ICSITech), pp. 231-236. IEEE, 2015.
[9] Zhang, Qian, Jie Lu, and Yaochu Jin. "Artificial intelligence in recommender systems." Complex & Intelligent
Systems 7, no. 1 (2021): 439-457. https://doi.org/10.1007/s40747-020-00212-w
[10] Khanal, Shristi Shakya, P. W. C. Prasad, Abeer Alsadoon, and Angelika Maag. "A systematic review: machine learning
based recommendation systems for e-learning." Education and Information Technologies 25, no. 4 (2020): 2635-
2664. https://doi.org/10.1007/s10639-019-10063-9
[11] Afoudi, Yassine, Mohamed Lazaar, and Mohammed Al Achhab. "Hybrid recommendation system combined
content-based filtering and collaborative prediction using artificial neural network." Simulation Modelling Practice
and Theory 113 (2021): 102375. https://doi.org/10.1016/j.simpat.2021.102375
[12] Javed, Umair, Kamran Shaukat, Ibrahim A. Hameed, Farhat Iqbal, Talha Mahboob Alam, and Suhuai Luo. "A review
of content-based and context-based recommendation systems." International Journal of Emerging Technologies in
Learning (iJET) 16, no. 3 (2021): 274-306. https://doi.org/10.3991/ijet.v16i03.18851
[13] Alsmadi, Mutasem K. "Content-based image retrieval using color, shape and texture descriptors and
features." Arabian Journal for Science and Engineering 45, no. 4 (2020): 3317-3330.
https://doi.org/10.1007/s13369-020-04384-y
[14] Unar, Salahuddin, Xingyuan Wang, and Chuan Zhang. "Visual and textual information fusion using Kernel method
for content based image retrieval." information Fusion 44 (2018): 176-187.
https://doi.org/10.1016/j.inffus.2018.03.006
[15] Deepak, Gerard, and J. Sheeba Priyadarshini. "Personalized and enhanced hybridized semantic algorithm for web
image retrieval incorporating ontology classification, strategic query expansion, and content-based
analysis." Computers & Electrical Engineering 72 (2018): 14-25.
https://doi.org/10.1016/j.compeleceng.2018.08.020
[16] Ali, Syed M., Gopal K. Nayak, Rakesh K. Lenka, and Rabindra K. Barik. "Movie recommendation system using genome
tags and content-based filtering." In Advances in Data and Information Sciences: Proceedings of ICDIS-2017, Volume
1, pp. 85-94. Springer Singapore, 2018. https://doi.org/10.1007/978-981-10-8360-0_8
[17] Singla, Rujhan, Saamarth Gupta, Anirudh Gupta, and Dinesh Kumar Vishwakarma. "FLEX: a content based movie
recommender." In 2020 International Conference for Emerging Technology (INCET), pp. 1-4. IEEE, 2020.
https://doi.org/10.1109/INCET49848.2020.9154163
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
217
[18] Sjarif, Nilam Nur Amir, Nurulhuda Firdaus Mohd Azmi, Suriayati Chuprat, Haslina Md Sarkan, Yazriwati Yahya, and
Suriani Mohd Sam. "SMS spam message detection using term frequency-inverse document frequency and random
forest algorithm." Procedia Computer Science 161 (2019): 509-515. https://doi.org/10.1016/j.procs.2019.11.150
[19] Ramadhan, Faisal, and Aina Musdholifah. "Online Learning Video Recommendation System Based on Course and
Sylabus Using Content-Based Filtering." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 15, no. 3
(2021): 265-274. https://doi.org/10.22146/ijccs.65623
[20] Wang, Donghui, Yanchun Liang, Dong Xu, Xiaoyue Feng, and Renchu Guan. "A content-based recommender system
for computer science publications." Knowledge-Based Systems 157 (2018): 1-9.
https://doi.org/10.1016/j.knosys.2018.05.001
[21] Meel, Priyanka, and Agniva Goswami. "Inverse document frequency-weighted Word2Vec model to recommend
apparels." In 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 1-7. IEEE,
2019. https://doi.org/10.1109/SPIN.2019.8711722
[22] Koren, Yehuda, Steffen Rendle, and Robert Bell. "Advances in collaborative filtering." Recommender systems
handbook (2021): 91-142. https://doi.org/10.1007/978-1-0716-2197-4_3
[23] Kluver, Daniel, Michael D. Ekstrand, and Joseph A. Konstan. "Rating-based collaborative filtering: algorithms and
evaluation." Social information access: Systems and technologies (2018): 344-390. https://doi.org/10.1007/978-3-
319-90092-6_10
[24] Feng, Junmei, Xiaoyi Fengs, Ning Zhang, and Jinye Peng. "An improved collaborative filtering method based on
similarity." PloS one 13, no. 9 (2018): e0204003. https://doi.org/10.1371/journal.pone.0204003
[25] Tarus, John K., Zhendong Niu, and Dorothy Kalui. "A hybrid recommender system for e-learning based on context
awareness and sequential pattern mining." Soft Computing 22 (2018): 2449-2461.
https://doi.org/10.1007/s00500-017-2720-6
[26] Logesh, R., and V. Subramaniyaswamy. "Exploring hybrid recommender systems for personalized travel
applications." In Cognitive Informatics and Soft Computing: Proceeding of CISC 2017, pp. 535-544. Springer
Singapore, 2019. https://doi.org/10.1007/978-981-13-0617-4_52
[27] Chen, Rui, Qingyi Hua, Yan-Shuo Chang, Bo Wang, Lei Zhang, and Xiangjie Kong. "A survey of collaborative filtering-
based recommender systems: From traditional methods to hybrid methods based on social networks." IEEE
access 6 (2018): 64301-64320. https://doi.org/10.1109/ACCESS.2018.2877208
[28] Walek, Bogdan, and Vladimir Fojtik. "A hybrid recommender system for recommending relevant movies using an
expert system." Expert Systems with Applications 158 (2020): 113452.
https://doi.org/10.1016/j.eswa.2020.113452
[29] Logesh, R., V. Subramaniyaswamy, D. Malathi, N. Sivaramakrishnan, and Varadarajan Vijayakumar. "Enhancing
recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble
method." Neural Computing and Applications 32 (2020): 2141-2164. https://doi.org/10.1007/s00521-018-3891-5
[30] Zamani, Hamed, Markus Schedl, Paul Lamere, and Ching-Wei Chen. "An analysis of approaches taken in the acm
recsys challenge 2018 for automatic music playlist continuation." ACM Transactions on Intelligent Systems and
Technology (TIST) 10, no. 5 (2019): 1-21. https://doi.org/10.1145/3344257
[31] Ahmed, Firas D., and Mazlina Abdul Majid. "Towards agent-based petri net decision making modelling for cloud
service composition: A literature survey." Journal of Network and Computer Applications 130 (2019): 14-38.
https://doi.org/10.1016/j.jnca.2018.12.001
[32] Nordin, Noor Syahirah, and Mohd Arfian Ismail. "A hybridization of butterfly optimization algorithm and harmony
search for fuzzy modelling in phishing attack detection." Neural Computing and Applications 35, no. 7 (2023): 5501-
5512. https://doi.org/10.1007/s00521-022-07957-0
[33] Kamil, Samar, Huda Sheikh Abdullah Siti Norul, Ahmad Firdaus, and Opeyemi Lateef Usman. "The rise of
ransomware: A review of attacks, detection techniques, and future challenges." In 2022 International Conference
on Business Analytics for Technology and Security (ICBATS), pp. 1-7. IEEE, 2022.
https://doi.org/10.1109/ICBATS54253.2022.9759000
[34] Idris, Nur Farahaina, and Mohd Arfian Ismail. "The study of cross-validated bagging fuzzy-id3 algorithm for breast
cancer classification." Journal of Intelligent & Fuzzy Systems 43, no. 3 (2022): 2567-2577.
https://doi.org/10.3233/JIFS-212842
[35] Isaac, Yeo Zhu Ern, Kohbalan Moorthy, Logenthiran Machap, Mohd Saberi Mohamad, and Jamaludin Sallim. "Gene
Regulatory Network Construction of Ovarian Cancer Based on Passing Attributes between Network for Data
Assimilation." In 2020 8th International Conference on Information Technology and Multimedia (ICIMU), pp. 251-
255. IEEE, 2020.
[36] Jeba, Shermina, Mohammed BinJubier, Mohd Arfian Ismail, Reshmy Krishnan, Sarachandran Nair, and Girija
Narasimhan. "A hybrid protection method to enhance data utility while preserving the privacy of medical patients
Journal of Advanced Research in Applied Sciences and Engineering Technology
Volume 56, Issue 2 (2026) 206-218
218
data publishing." International Journal of Advanced Computer Science and Applications 13, no. 11 (2022).
https://doi.org/10.14569/IJACSA.2022.0131194
[37] Jeba, Shermina, Mohammed Binjubier, Mohd Arfian Ismail, R. E. S. H. M. Y. Krishnan, S. A. R. A. C. H. A. N. D. R. A.
N. Nair, and G. I. R. I. J. A. Narasimhan. "Classifying and evaluating privacy-preserving techniques based on
protection methods: A comprehensive study." Journal of Theoretical and Applied Information Technology 15
(2022).
[38] Jaya, M. Izham, Goh Xin Tong, Mohd Faizal Ab Razak, Azlee Zabidi, and Syifak Izhar Hisham. "Geofence Alerts
Application With GPS Tracking For Children Monitoring (CTS)." In 2021 International Conference on Software
Engineering & Computer Systems and 4th International Conference on Computational Science and Information
Management (ICSECS-ICOCSIM), pp. 222-226. IEEE, 2021. https://doi.org/10.1109/ICSECS52883.2021.00047
[39] Zulkifli, Nor Saradatul Akmar, Mohammad Ridwan Satrial, Mohd Zamri Osman, Nor Syahidatul Nadiah Ismail, and
Muhammad Rusydi Muhammad Razif. "IoT-based smart environment monitoring system for air pollutant detection
in Kuantan, Pahang, Malaysia." In IOP Conference Series: Materials Science and Engineering, vol. 769, no. 1, p.
012014. IOP Publishing, 2020. https://doi.org/10.1088/1757-899X/769/1/012014
[40] Samsuddin, Muhammad Naim Mohd, Anis Farihan Mat Raffei, and Nur Shamsiah Abdul Rahman. "IoT based sport
healthcare monitoring system." In 2021 International Conference on Software Engineering & Computer Systems
and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), pp.
316-319. IEEE, 2021. https://doi.org/10.1109/ICSECS52883.2021.00064
[41] Rahman, N. S. A., A. N. Rosman, and N. A. Sahabudin. "Students’ Continuance of Using E-Learning System: A Review
of Conceptual Frameworks." In IOP Conference Series: Materials Science and Engineering, vol. 769, no. 1, p. 012044.
IOP Publishing, 2020. https://doi.org/10.1088/1757-899X/769/1/012044
[42] Abidin, Ahmad Firdaus Zainal, Mohd Faaizie Darmawan, Mohd Zamri Osman, Shahid Anwar, Shahreen Kasim, Arda
Yunianta, and Tole Sutikno. "Adaboost-multilayer perceptron to predict the student’s performance in software
engineering." Bulletin of Electrical Engineering and Informatics 8, no. 4 (2019): 1556-1562.
https://doi.org/10.11591/eei.v8i4.1432
[43] Salleh, Azrie, Danakorn Nincarean Eh Phon, Ferda Ernawan, Ahmad Yusuf Ismail, and Prajanto Wahyu Adi.
"Teacher's ICT Skills and Readiness of Integrating Augmented Reality in Education." In 2021 5th International
Conference on Informatics and Computational Sciences (ICICoS), pp. 205-209. IEEE, 2021.
https://doi.org/10.1109/ICICoS53627.2021.9651904
[44] Mukhtar, Harun, Muhammad Akmal Remli, Wong KNSWS, and Mohd Saberi Mohamad. "Deep Learning With
Processing Algorithms for Forecasting Tourist Arrivals." TEM Journal (2023). https://doi.org/10.18421/TEM123-57
[45] Siti Aishah Tsamienah Taib, Noratikah Abu, Azlyna Senawi, and Clark Kendrick Go. “The Implementation of Long-
Short Term Memory for Tourism Industry in Malaysia”. Journal of Advanced Research in Applied Sciences and
Engineering Technology 46(2) (2025): 232-239.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The DL (Deep Learning) method is the standard for forecasting tourist arrivals. This method provides very good forecasting results but needs improvement if the data is small. Statistical data from the BPS (Central Bureau of Statistics) needs to be corrected, resulting in forecasts that tend to be invalid. This study uses statistical data and GT (Google Trends) as a solution so that the data is sufficient. GT data has a lot of noise because there is a shift between web searches and departures. This difference will produce noise that needs to be cleaned. We use monthly data from January 2008 to December 2021 from BPS sources combined with GT. Hilbert-Huang Transform (HHT) is proposed to clean data from various disturbances. The DL used in this study is long short-time memory (LSTM) and was evaluated using the root mean squared error RMSE and mean absolute percentage error (MAPE). The evaluation results show that the HHT-LSTM results are better than without data cleaning.
Article
Full-text available
Medical patient data need to be published and made available to researchers so that they can use, analyse, and evaluate the data effectively. However, publishing medical patient data raises privacy concerns regarding protecting sensitive data while preserving the utility of the released data. The privacy-preserving data publishing (PPDP) process attempts to keep public data useful without risking the medical patients' privacy. Through protection methods like perturbing, suppressing, or generalizing values, which lead to uncertainty in identity inference or sensitive value estimation, the PPDP aims to reduce the risks of patient data being disclosed and to preserve the potential use of published data. Although this method is helpful, information loss is inevitable when attempting to achieve a high level of privacy using protection methods. In addition, the privacy-preserving techniques may affect the use of data, resulting in imprecise or even impractical knowledge extraction. Thus, balancing privacy and utility in medical patient data is essential. This study proposed an innovative technique that used a hybrid protection method for utility enhancement while preserving medical patients' data privacy. The utilized technique could partition information horizontally and vertically, resulting in data being grouped into columns and equivalence classes. Then, the attributes assumed to be easily known by any attacker are determined by upper and lower protection levels (U P L and LP L). This work also depends on making the false matches and value swapping to make sure that the attribute disclosure is less likely to happen. The innovative technique makes data more useful. According to the results, the innovative technique delivers about 93.4% data utility when the percentage of exchange level is 5% using LP L and 95% using U P L with a 4.5K medical patient dataset. In conclusion, the innovative technique has minimized risk disclosure compared to other existing works.
Article
Full-text available
Many data analysis applications encounter the challenge of preserving the privacy of information. Over the past few years, many partially published data have become subjects of various concerns, ranging from unlawful access to private data to privacy breaches and unintended use of personal information. This problem has limited progress in advancing published data, prompting the need for robust privacy-protection techniques, which can minimize the chances of identifying sensitive individual information by unauthorized persons. The simplest solution to preserving sensitive information is to avoid public disclosure of such information. However, this might constitute a problem for data analysis, as there may not be available datasets to analyze and discover interesting patterns. Sometimes, the dataset must be disclosed under government regulations to enable access and subsequent analysis. Sometimes, the data owner may modify the data to ensure privacy and retain sufficient information for a safe release to the public. This process is usually referred to as privacy-preserving data publishing (PPDP). The review in this paper has rigorously evaluated some existing preserving privacy techniques and classified them based on their methods to reduce the risk of disclosing information. Moreover, the review focused on the methods of the current preserving privacy techniques to protect data and preserve the privacy of sensitive information, which is considered a key contribution of this study as it is expected to guide scholars to gain a deeper knowledge of the existing privacy preservation methods. This study also compared and analyzed various privacy-preserving techniques in terms of their advantages and drawbacks.
Article
Full-text available
Fuzzy system is one of the most used systems in the decision-making and classification method as it is easy to understand because the way this system works is closer to how humans think. It is a system that uses human experts to hold the membership values to make decisions. However, it is hard to determine the fuzzy parameter manually in a complex problem, and the process of generating the parameter is called fuzzy modelling. Therefore, an optimization method is needed to solve this issue, and one of the best methods to be applied is Butterfly Optimization Algorithm. In this paper, BOA was improvised by combining this algorithm with Harmony Search (HS) in order to achieve optimal results in fuzzy modelling. The advantages of both algorithms are used to balance the exploration and exploitation in the searching process. Two datasets from UCI machine learning were used: Website Phishing Dataset and Phishing Websites Dataset. As a result, the average accuracy for WPD and PWD was 98.69% and 98.80%, respectively. In conclusion, the proposed method shows promising and effective results compared to other methods.
Article
Full-text available
The performance of recommender systems highly impacts both music streaming platform users and the artists providing music. As fairness is a fundamental value of human life, there is increasing pressure for these algorithmic decision-making processes to be fair as well. However, many factors make recommender systems prone to biases, resulting in unfair outcomes. Furthermore, several stakeholders are involved, who may all have distinct needs requiring different fairness considerations. While there is an increasing interest in research on recommender system fairness in general, the music domain has received relatively little attention. This mini review, therefore, outlines current literature on music recommender system fairness from the perspective of each relevant stakeholder and the stakeholders combined. For instance, various works address gender fairness: one line of research compares differences in recommendation quality across user gender groups, and another line focuses on the imbalanced representation of artist gender in the recommendations. In addition to gender, popularity bias is frequently addressed; yet, primarily from the user perspective and rarely addressing how it impacts the representation of artists. Overall, this narrative literature review shows that the large majority of works analyze the current situation of fairness in music recommender systems, whereas only a few works propose approaches to improve it. This is, thus, a promising direction for future research.
Article
Full-text available
Recommender systems are efficient tools for filtering online information, which is widespread owing to the changing habits of computer users, personalization trends, and emerging access to the internet. Even though the recent recommender systems are eminent in giving precise recommendations, they suffer from various limitations and challenges like scalability, cold-start, sparsity, etc. Due to the existence of various techniques, the selection of techniques becomes a complex work while building application-focused recommender systems. In addition, each technique comes with its own set of features, advantages and disadvantages which raises even more questions, which should be addressed. This paper aims to undergo a systematic review on various recent contributions in the domain of recommender systems, focusing on diverse applications like books, movies, products, etc. Initially, the various applications of each recommender system are analysed. Then, the algorithmic analysis on various recommender systems is performed and a taxonomy is framed that accounts for various components required for developing an effective recommender system. In addition, the datasets gathered, simulation platform, and performance metrics focused on each contribution are evaluated and noted. Finally, this review provides a much-needed overview of the current state of research in this field and points out the existing gaps and challenges to help posterity in developing an efficient recommender system.
Article
While recent years have witnessed a rapid growth of research papers on recommender system (RS), most of the papers focus on inventing machine learning models to better fit user behavior data. However, user behavior data is observational rather than experimental. This makes various biases widely exist in the data, including but not limited to selection bias, position bias, exposure bias, and popularity bias. Blindly fitting the data without considering the inherent biases will result in many serious issues, e.g., the discrepancy between offline evaluation and online metrics, hurting user satisfaction and trust on the recommendation service, etc. To transform the large volume of research models into practical improvements, it is highly urgent to explore the impacts of the biases and perform debiasing when necessary. When reviewing the papers that consider biases in RS, we find that, to our surprise, the studies are rather fragmented and lack a systematic organization. The terminology “bias” is widely used in the literature, but its definition is usually vague and even inconsistent across papers. This motivates us to provide a systematic survey of existing work on RS biases. In this paper, we first summarize seven types of biases in recommendation, along with their definitions and characteristics. We then provide a taxonomy to position and organize the existing work on recommendation debiasing. Finally, we identify some open challenges and envision some future directions, with the hope of inspiring more research work on this important yet less investigated topic. The summary of debiasing methods reviewed in this survey can be found at https://github.com/jiawei-chen/RecDebiasing.
Article
Recommender systems are an essential tool to relieve the information overload challenge and play an important role in people’s daily lives. Since recommendations involve allocations of social resources (e.g., job recommendation), an important issue is whether recommendations are fair. Unfair recommendations are not only unethical but also harm the long-term interests of the recommender system itself. As a result, fairness issues in recommender systems have recently attracted increasing attention. However, due to multiple complex resource allocation processes and various fairness definitions, the research on fairness in recommendation is scattered. To fill this gap, we review over 60 papers published in top conferences/journals, including TOIS, SIGIR, and WWW. First, we summarize fairness definitions in the recommendation and provide several views to classify fairness issues. Then, we review recommendation datasets and measurements in fairness studies and provide an elaborate taxonomy of fairness methods in the recommendation. Finally, we conclude this survey by outlining some promising future directions.