Available via license: CC BY 4.0
Content may be subject to copyright.
JURNAL MEDIA INFORMATIKA BUDIDARMA
Volume 6, Nomor 4, Oktober 2022, Page 2341-2345
ISSN 2614-5278 (media cetak), ISSN 2548-8368 (media online)
Available Online at https://ejurnal.stmik-budidarma.ac.id/index.php/mib
DOI: 10.30865/mib.v6i4.4554
Inggrid Resmi Benita, Copyright © 2022, MIB, Page 2341
Submitted: 26/07/2022; Accepted: 31/08/2022; Published: 25/10/2022
News Recommender System Based on User Log History Using Rapid
Automatic Keyword Extraction
Inggrid Resmi Benita, Z K A Baizal*
Informatics, School of Computing, Telkom Universiry, Bandung, Indonesia
Email: 1inggridbenita@student.telkomuniversity.ac.id, 2,*baizal@telkomuniversity.ac.id
Email Penulis Korespondensi: baizal@telkomuniversity.ac.id
Abstract−There are many ways to find information; one of them is reading online news. However, searching for news online
becomes more difficult because we should visit multiple platforms to find information. Sometimes, the recommended news
doesn't match the user's interests. In many prior works, news recommendations are based on trending. Thus, the recommended
news may not necessarily match the user's interests. To overcome this, we built a web-based news recommender system to
make it easier for users to find news. We use the Rapid Automatic Keyword Extraction (RAKE) method in the recommendation
process because this method can recommend news based on user preferences by utilizing user history logs. RAKE converts the
title and content of the news into vector representation using Count vectorizer and applies the Cosine Similarity function to
compare similarities between news. The test results show that the average performance of our proposed system is 90.8%, this
accuracy outperforms earlier systems in terms of performance by the purpose of the recommender system, i.e., diversity,
novelty, and relevance.
Keywords: Online News; News Recommender System; Rapid Automatic Keyword Extraction
1. INTRODUCTION
Over time, the internet has played an essential role in accessing information because more and more people
worldwide are looking for information online. One of them is looking for information from online news. Therefore,
the number of online media that produces news is increasing. As a result, people have difficulty finding the desired
news within a certain period [1], [2]. Also, the community has a problem because they have to visit several news
websites to scan what they want [3], [4]. A news recommender system can be a solution to solve the above
problems, one of which is to produce personalized news content. Li, M. et al. [5] explain that personalized news
content can help users find the news most likely to interest the user's needs. Interest preferences are needed to
produce personalized news content containing user history log data.
There are several studies related to the topic of news recommender system. Wang, Z. et al. [1] have
developed a news recommender system based on keyword extraction. Based on this study, it is concluded that
keyword extraction can recommend the latest news topics compared to other recommendation techniques.
However, this study has a weakness: people's names and nouns are still defined as candidate keywords. In addition,
the recommended news is based on trends only, so it does not match users' interests. Wang, Y. et al. [6] developed
a news recommender system based on user behavior.
The news recommender system uses a collaborative filtering method that utilizes user behavior attributes,
i.e., browsing time. However, this paper ignores the level of accuracy because it only focuses on the algorithm's
efficiency. Based on prior works, we created a news recommender system based on user log history using keyword
extraction. The method we use in the keyword extraction is RAKE, because RAKE is unsupervised method used
to perform a keyword extraction process in a larger amount of data from several types of individual documents
than other keyword extraction algorithms [7], [8]. There are also several studies that discuss the RAKE. Thushara,
M. G. et al. [8] compared the performance of 3 keyphrase extraction algorithms such as Textrank, RAKE, and
Position Rank. Based on the comparison results, Position Rank gives better results than Textrank and RAKE.
Textrank does not consider the occurrence of keyword positions in a document, while RAKE if does not consider
stopwords; the results obtained are irrelevant.
Huang, H. et al. [7] developed a keyword extraction algorithm combining Named Entity Recognition (NER)
and RAKE called NER-RAKE. This algorithm is for keyword extraction of scientific literature. The NER process
in this method optimizes the RAKE candidate keywords selection process, and the fast and practical features are
retained. Meanwhile, Hu J., et al. [9] developed a keyword extraction algorithm based on a distributed skip-gram
model. The algorithm has effective results for keyword extraction compared to the methods of frequency, Term
Frequency-Inverse Document Frequency (TF-IDF), Textrank, and RAKE.
Based on the problems and previous work, this study created a web-based news recommendation system
based on user log history using RAKE. Thus, the recommended news is according to the interests of the user. In
addition, in this study, RAKE will consider people's names and nouns so as not to enter them as keyword candidates
to improve our previous research. This paper is structured based on the organization of the paper as follows: the
first part describes the introduction, the second part describes the related studies, the third part describes the system
architecture, and the fourth part explains the results of the system test evaluation, and the fifth part provides
conclusions.
JURNAL MEDIA INFORMATIKA BUDIDARMA
Volume 6, Nomor 4, Oktober 2022, Page 2341-2345
ISSN 2614-5278 (media cetak), ISSN 2548-8368 (media online)
Available Online at https://ejurnal.stmik-budidarma.ac.id/index.php/mib
DOI: 10.30865/mib.v6i4.4554
Inggrid Resmi Benita, Copyright © 2022, MIB, Page 2342
Submitted: 26/07/2022; Accepted: 31/08/2022; Published: 25/10/2022
2. RESEARCH METHODOLOGY
In this section, the recommender system that we built will describe. The application recommender system flow
that is created starts when the user logs in first. If the user is new or has never read any news, the system will
display all the news based on the latest time. However, if the user has read one or more news, the system will
recommend news based on the user history log using RAKE.
For a more detailed explanation, Figure 1 describes how the system recommends news based on the user's
history log starting when the user reads the news. Next, the PHP server responsible for accessing PostgreSQL adds
the read news to the PostgreSQL Database as log history data. PostgreSQL functions to store news log history data
and user data. Then, PostgreSQL Database sends all news history data back to the PHP server and passes it to User
Interface. From the User Interface, the news history data is sent to the Python server. Python server is responsible
for accessing the Dataset because the Dataset in this study was taken offline before the system was run. After
which, the Python server will send recommender news to the User Interface. Thus the user can see the
recommendations displayed based on the user history log.
Figure 1. System Architecture Recommender News Based on History
Figure 2 describes the process flow of a recommendation using RAKE. Firstly, the data needs to be pre-
processed to eliminate unnecessary data such as stop words, punctuation, white space, and handling synonyms in
the dataset. Then, the system generates word representation by combining the column 'Title' and 'Content' of news
into 'Bag of words' and extracts keywords from complete sentences in 'Bag of words' using RAKE. Then, to
eliminate duplications, change to lowercase. Following that, 'Bag of words' will convert to a vector representation,
a simple frequency counter for each word in the 'Bag of words,' and the cosine similarity matrix will be used to
compare similarities between news. The final process is to compare the news titles in the user history log with the
relevant news index according to the Similarity Matrix. The recommendation results are sorted from the highest
Cosine Similarity value.
Figure 2. Flow of Recommendation Process
JURNAL MEDIA INFORMATIKA BUDIDARMA
Volume 6, Nomor 4, Oktober 2022, Page 2341-2345
ISSN 2614-5278 (media cetak), ISSN 2548-8368 (media online)
Available Online at https://ejurnal.stmik-budidarma.ac.id/index.php/mib
DOI: 10.30865/mib.v6i4.4554
Inggrid Resmi Benita, Copyright © 2022, MIB, Page 2343
Submitted: 26/07/2022; Accepted: 31/08/2022; Published: 25/10/2022
2.1 Candidate Keywords Extraction
The candidate keywords were extracted using RAKE. RAKE is an automatic domain-independent method for
extracting single document keywords [10], [11]. We use RAKE to extract relevant word sets by combining the
title and content of the news into a candidate keyword. In candidate keywords, people's names and nouns are not
included.
2.2 Count vectorizer
The recommendation model can only compare vectors (matrixes) with others [12]. In this study, the title and
content of the news that has been processed into candidate keywords are represented as vectors using
Countvectorizer.
2.3 Cosine Similarity
After getting a vector matrix containing the sum of all the candidate keywords, we then apply the Cosine Similarity
function to perform a similarity comparison between news. The Cosine Similarity calculation between two vectors
[13] is obtained according to (1). Where vector a is an object, another object is symbolized by vector b. Then, n is
the symbol of a database.
(1)
3. RESULT AND DUSCUSSION
3.1 The Dataset
In the period 1 April 2022 - 7 April 2022 and 5 May 2022 - 15 May 2022, the news used as a dataset was taken
from the news website CNN Indonesia. The categories of news taken are lifestyle and entertainment. News data
retrieval is done manually before the system runs. One news dataset requires a news title, publication date, content
of the news, news URL link, and poster URL link. In addition, a unique news ID will be assigned to avoid
duplication.
3.2 The Result Cosine Similarity Calculation
Table 1. Cosine Similarity Calculation Results
id
News Tittle
Cosine
Similarity
21
Waktu yang Tepat untuk Ikhtiar Jalani Program Bayi Tabung (english: The Right Time
to Fight for the IVF Program)
0.07
132
Metallica Sumbang Rp7,1 M untuk Pengungsi Ukraina (english: Metallica Donates
IDR 7.1 Billion for Ukrainian Refugees)
0.06
77
Cerita Desainer Soal Jas JK di Jepang: Dibuat 1x24 Jam (english: Designer's Story
About JK Suits in Japan: Made 1x24 Hours)
0.06
81
Tiba di AS, Jokowi Gandeng Mesra Iriana saat Turun dari Tangga Pesawat (english:
Arriving in the US, Jokowi Collaborates with Iriana when Descending from the
Airplane Stairs)
0.06
75
Balenciaga Jual Sepatu Rusak Seharga Rp9 Juta (English: Balenciaga Sells Damaged
Shoes for IDR 9 Million)
0.05
45
Waktu Terbaik untuk Berolahraga saat Puasa (english: The Best Time to Exercise
While Fasting)
0.04
185
Akhir Pekan Panjang, Jaringan Cinema XXI Tambah Jam Tayang (english: Long
Weekend, Cinema XXI Network Adds Showtimes)
0.04
Table 1 is the result of the news calculation stored in the history log titled "Temui Jokowi, Elon Musk Pakai
Kaos Rp349 Ribu (english: Meet Jokowi, Elon Musk Wears Rp 349 Thousand T-shirt)". The system performs a
RAKE calculation based on the user's history log, which focuses on the title and content of the news. Then it is
converted into a vector representation using the Count Vectorizer, after which the system calculates the Cosine
Similarity based on the news title.
3.3 The Result of Website News Recommender System
News website that we created to recommend news called Verofinnews. The home page of the Verofinnews website
as shown in Figure 3, on the left contains the home, history, and profile. And then, the right side has several news
recommendations that can be read by the user based on the user history log. In addition, on the home page, users
can also filter by a category, i.e., entertainment or lifestyle.
JURNAL MEDIA INFORMATIKA BUDIDARMA
Volume 6, Nomor 4, Oktober 2022, Page 2341-2345
ISSN 2614-5278 (media cetak), ISSN 2548-8368 (media online)
Available Online at https://ejurnal.stmik-budidarma.ac.id/index.php/mib
DOI: 10.30865/mib.v6i4.4554
Inggrid Resmi Benita, Copyright © 2022, MIB, Page 2344
Submitted: 26/07/2022; Accepted: 31/08/2022; Published: 25/10/2022
Figure 3. Homepage Website
Recommended news based on the Cosine Similarity calculation by ordering the most similarity scores. The
data used to calculate Cosine Similarity is the title and content of the news. Users can search for news based on
the desired keywords on the home page. After the user enters the keyword, the system will display news with a
title containing the inputted keyword, as shown in Figure 4.
Figure 4. Search News by Keywords
3.4 Evaluation
The evaluation of the system's performance in this study was using one of the techniques in the inquiry method,
i.e., surveying [14], [15] of 65 respondents, where respondents assessed the system tested in the form of an
evaluation form. The respondents consisting of 33 students, 16 employees, and 16 others. The data obtained from
the performance evaluation is in the form of respondents' subjective statements regarding the assessment of
Relevance, Novelty, Diversity, and respondent satisfaction [16], [17] to the Verofinnews website.
Diversity is a difference. Where the recommended news has diversity. In other words, the recommended
news on the home page is not tied to a single story. For example, if the user already has a history log, the system
will recommend other news similar to the news in the history log. Relevance is interrelated. When the user has
read the news, the recommended news on the homepage relates to the news in the history log. Novelty is a novelty
in study that is useful for people's lives.
In evaluating the performance of the system, the scores chosen by the respondents ranged from 1-5. After
getting all the assessment scores, each score will be added up and divided by the maximum score to get the
percentage result of the performance test [16] as shown in Table 2.
Table 2. Percentage of System Performance Test Results
Rating Parameters
Score
Diversity
92.3%
Relevance
91.4%
Novelty
88.6%
Average
90.8%
JURNAL MEDIA INFORMATIKA BUDIDARMA
Volume 6, Nomor 4, Oktober 2022, Page 2341-2345
ISSN 2614-5278 (media cetak), ISSN 2548-8368 (media online)
Available Online at https://ejurnal.stmik-budidarma.ac.id/index.php/mib
DOI: 10.30865/mib.v6i4.4554
Inggrid Resmi Benita, Copyright © 2022, MIB, Page 2345
Submitted: 26/07/2022; Accepted: 31/08/2022; Published: 25/10/2022
Based on Table 2, our proposed system has a higher accuracy value than the previous system in terms of
performance. However, the conclusion obtained from respondents' satisfaction with the User Interface of the
Verofinnews website, it is known that the system design does not meet the usability criteria of a system. Because
it has not given satisfaction to the respondents, thus, it is necessary to improve the page layout and add other
features.
4. CONCLUSION
This paper proposes a novelty news recommender system using RAKE based on user history logs. The test results
show that the news recommendation process based on the history log is carried out correctly. The recommended
news is similar to the news in the history log, and the recommended news is according to user interests. In addition,
the RAKE method used successfully did not consider the names of people and nouns as candidate keywords. The
news recommender system can achieve the goal of a better system by getting a Diversity value of 92.3%,
Relevance of 91.4%, and Novelty of 88.6%, so that the average rating obtained is 90.8%. However, the system
design has not met the usability criteria of a better system. In addition, the news dataset used in this study is
relatively limited in terms of categories. Therefore, it is hoped that the system design that is made in the future
needs to measure the user experience aspect to meet the usability criteria. The news dataset is expanded to reach
all people.
REFERENCES
[1] Z. Wang, K. Hahn, Y. Kim, S. Song, and J. M. Seo, “A news-topic recommender system based on keywords extraction,”
Multimedia Tools and Applications, vol. 77, no. 4, 2018, doi: 10.1007/s11042-017-5513-0.
[2] A. A. Fakhri, Z. K. A. Baizal, and E. B. Setiawan, “Restaurant Recommender System Using User-Based Collaborative
Filtering Approach: A Case Study at Bandung Raya Region,” in Journal of Physics: Conference Series, 2019, vol. 1192,
no. 1. doi: 10.1088/1742-6596/1192/1/012023.
[3] W. Hariri, K. I. Ghauth, and C. Eswaran, “A Multimedia Content Recommender System Using Table of Contents and
Content-Based Filtering,” Advanced Science Letters, vol. 24, no. 2, 2018, doi: 10.1166/asl.2018.10699.
[4] Z. K. A. Baizal, D. H. Widyantoro, and N. U. Maulidevi, “Computational model for generating interactions in
conversational recommender system based on product functional requirements,” Data and Knowledge Engineering, vol.
128, 2020, doi: 10.1016/j.datak.2020.101813.
[5] M. Li and L. Wang, “A Survey on Personalized News Recommendation Technology,” IEEE Access, vol. 7, 2019, doi:
10.1109/ACCESS.2019.2944927.
[6] Y. Wang and W. Shang, “Personalized news recommendation based on consumers’ click behavior,” 2016. doi:
10.1109/FSKD.2015.7382016.
[7] H. Huang, X. Wang, and H. Wang, “ NER‐RAKE : An improved rapid automatic keyword extraction method for scientific
literatures based on named entity recognition ,” Proceedings of the Association for Information Science and Technology,
vol. 57, no. 1, 2020, doi: 10.1002/pra2.374.
[8] M. G. Thushara, T. Mownika, and R. Mangamuru, “A comparative study on different keyword extraction algorithms,”
2019. doi: 10.1109/ICCMC.2019.8819630.
[9] J. Hu, S. Li, Y. Yao, L. Yu, G. Yang, and J. Hu, “Patent keyword extraction algorithm based on distributed representation
for patent classification,” Entropy, vol. 20, no. 2, 2018, doi: 10.3390/e20020104.
[10] J. S. Baruni and Dr. J. G. R. . Sathiaseelan, “Keyphrase Extraction from Document Using RAKE and TextRank
Algorithms,” International Journal of Computer Science and Mobile Computing, vol. 9, no. 9, 2020, doi:
10.47760/ijcsmc.2020.v09i09.009.
[11] S. Anjali, M. Meera Nair, and M. G. Thushara, “A graph based approach for keyword extraction from documents,” 2019.
doi: 10.1109/ICACCP.2019.8882946.
[12] J. Ng, “Content-based Recommender Using Natural Language Processing (NLP),” 2020.
https://www.kdnuggets.com/2019/11/content-based-recommender-using-natural-language-processing-nlp.html (accessed
Jul. 10, 2022).
[13] A. R. Lahitani, A. E. Permanasari, and N. A. Setiawan, “Cosine similarity to determine similarity measure: Study case in
online essay assessment,” 2016. doi: 10.1109/CITSM.2016.7577578.
[14] S. P. Dewi, G. R. Dantes, and G. Indrawan, “EVALUASI USABILITY PADA ASPEK SATISFACTION
MENGGUNAKAN TEKNIK KUESIONER PADA SISTEM LMS PROGRAM KEAHLIAN GANDA,” Jurnal
Pendidikan Teknologi dan Kejuruan, vol. 15, no. 1, 2018, doi: 10.23887/jptk-undiksha.v15i1.13028.
[15] B. M. Maake, S. O. Ojo, and T. Zuva, “A Survey on Data Mining Techniques in Research Paper Recommender Systems,”
2019, pp. 119–143. doi: 10.4018/978-1-5225-8437-7.ch006.
[16] F. Ramadhan and A. Musdholifah, “Online Learning Video Recommendation System Based on Course and Sylabus Using
Content-Based Filtering,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, 2021, doi:
10.22146/ijccs.65623.
[17] M. Kunaver and T. Požrl, “Diversity in recommender systems – A survey,” Knowledge-Based Systems, vol. 123, 2017,
doi: 10.1016/j.knosys.2017.02.009.