Ayumu Kubota’s research while affiliated with KDDI Research and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (19)


Machine Learning Based Prediction of Vulnerability Information Subject to a Security Alert
  • Conference Paper

January 2023

·

1 Read

·

1 Citation

Ryu Watanabe

·

Takashi Matsunaka

·

Ayumu Kubota

·

Jumpei Urakawa

Resource Authorization Methods for Edge Computing

March 2022

·

21 Reads

·

1 Citation

To realize the distribution of processing load and prompt response, the concept of edge computing is drawing attention. Under the edge computing environment, server processing is carried out on an edge node located near various devices with a communication module instead of on a central server. These edge nodes can be provided by another entity other than service providers such as network operators. The edge nodes have fewer computing resources than a central server has. Therefore, appropriate dynamic resource management is required to avoid resource exhaustion. For this purpose, authorization techniques, e.g., OAuth, can be applied. In this paper, we consider applying the OAuth protocol for a privilege delegation on edge computing. Firstly, we clarify the authentication flows differ depending on the relationship of edge computing players (edge provider, user, service provider). We then describe the unique problems of resource authorization on edge computing.


Detecting Malicious Websites by Query Templates

February 2020

·

246 Reads

·

4 Citations

Lecture Notes in Computer Science

Satomi Kaneko

·

·

Yukiko Sawaya

·

[...]

·

Kazumasa Omote

With the development of the Internet, web content is exponentially increasing. Along with this, web-based attacks such as drive-by download attacks and phishing have grown year on year. To prevent such attacks, URL blacklists are widely used. However, URL blacklists are not enough because they lack the ability to detect newly generated malicious URLs. In this paper, we propose an automatic query template generation method to detect malicious websites. Our method focus on URL query strings that contained similarities on malicious website groups. Additionally, we evaluate our proposed method with large-scale dataset and verify effectiveness. Consequently, our proposed method can grasp the characteristics of malicious campaigns; it can detect 11,292 malicious unique domains not detected by Google Safe Browsing. Moreover, our method achieved high precision in the seven months of experiments.


Large-Scale Analysis of Domain Blacklists

January 2020

·

276 Reads

·

8 Citations

Malicious content has grown along with the explosion of the Internet. Therefore, many organizations construct and maintain blacklists to help web users protect their computers. There are many kinds of blacklists in which domain blacklists are the most popular one. Existing empirical analyses on domain blacklists have several limitations such as using only outdated blacklists, omitting important blacklists, or focusing only on simple aspects of blacklists. In this paper, we analyze the top 14 blacklists including popular and updated blacklists like Safe Browsing from Google and urlblacklist.com. We are the first to filter out the old entries in the blacklists using an enormous dataset of user browsing history. Besides the analysis on the intersections and the registered information from Whois (such as top-level domain, domain age and country), we also build two classification models for web content categories (i.e., education, business, etc.) and malicious categories (i.e., landing and distribution) using machine learning. Our work found some important results. First, the blacklists Safe Browsing version 3 and 4 are being separately deployed and have independent databases with diverse entries although they belong to the same organization. Second, the blacklist dsi.ut capitole.fr is almost a subset of the blacklist urlblacklist.com with 98% entries. Third, largest portion of entries in the blacklists are created in 2000 with 6.08%, and from United States with 24.28%. Fourth, Safe Browsing version 4 can detect younger domains compared with the others. Fifth, Tech & Computing is the dominant web content category in all the blacklists, and the blacklists in each group (i.e., small public blacklists, large public blacklists, private blacklists) have higher correlation in web content as opposed to blacklists in other groups. Finally, the number of landing domains are larger than that of distribution domains at least 75% in large public blacklists and at least 60% in other blacklists.




Hunting Brand Domain Forgery: A Scalable Classification for Homograph Attack

June 2019

·

465 Reads

·

9 Citations

IFIP Advances in Information and Communication Technology

Visual homograph attack is a way that the attackers deceive victims about what domain they are communicating with by exploiting the fact that many characters look alike. The attack is growing into a serious problem and raising broad attention in reality when recently many brand domains have been attacked such as apple.com (Apple Inc.), adobe.com (Adobe Systems Incorporated), lloydsbank.co.uk (Lloyds Bank), etc. Therefore, how to detect visual homograph becomes a hot topic both in industry and research community. Several existing papers and tools have been proposed to find some homographs of a given domain based on different subsets of certain look-alike characters, or based on an analysis on the registered International Domain Name (IDN) database. However, we still lack a scalable and systematic approach that can detect sufficient homographs registered by attackers with a high accuracy and low false positive rate. In this paper, we construct a classification model to detect homographs and potential homographs registered by attackers using machine learning on feasible and novel features which are the visual similarity on each character and some selected information from Whois. The implementation results show that our approach can bring up to 95.90% of accuracy with merely 3.27% of false positive rate. Furthermore, we also make an empirical analysis on the collected homographs and found some interesting statistics along with concrete misbehaviors and purposes of the attackers.


Peek-a-boo, I Can See You, Forger: Influences of Human Demographics, Brand Familiarity and Security Backgrounds on Homograph Recognition

April 2019

·

55 Reads

Homograph attack is a way that attackers deceive victims about which domain they are communicating with by exploiting the fact that many characters look alike. The attack becomes serious and is raising broad attention when recently many brand domains have been attacked such as Apple Inc., Adobe Inc., Lloyds Bank, etc. We first design a survey of human demographics, brand familiarity, and security backgrounds and apply it to 2,067 participants. We build a regression model to study which actors affect participants' ability in recognizing homograph domains. We then find that participants exhibit different ability for different kinds of homographs. For instance, female participants tend to be able to recognize homographs while male participants tend to be able to recognize non-homographs. Furthermore, 16.59% of participants can recognize homographs whose visual similarity with the target brand domains is under 99.9%; however, when the similarity increases to 99.9%, the number of participants who can recognize homographs drops down significantly to merely 0.19%; and for the homographs with 100% of visual similarity, there is no way for the participants to recognize. We also find that people working or educated in computer science or computer engineering are the ones who tend to exhibit the best ability to recognize all kinds of homographs and non-homographs. Surprisingly to us, brand familiarity does not influcence the ability in either homographs or non-homographs. Stated differently, people who frequently use the brand domains but do not have enough knowledge are still easy to fall in vulnerabilities.


Predicting Impending Exposure to Malicious Content from User Behavior

October 2018

·

239 Reads

·

61 Citations

Many computer-security defenses are reactive---they operate only when security incidents take place, or immediately thereafter. Recent efforts have attempted to predict security incidents before they occur, to enable defenders to proactively protect their devices and networks. These efforts have primarily focused on long-term predictions. We propose a system that enables proactive defenses at the level of a single browsing session. By observing user behavior, it can predict whether they will be exposed to malicious content on the web seconds before the moment of exposure, thus opening a window of opportunity for proactive defenses. We evaluate our system using three months' worth of HTTP traffic generated by 20,645 users of a large cellular provider in 2017 and show that it can be helpful, even when only very low false positive rates are acceptable, and despite the difficulty of making "on-the-fly'' predictions. We also engage directly with the users through surveys asking them demographic and security-related questions, to evaluate the utility of self-reported data for predicting exposure to malicious content. We find that self-reported data can help forecast exposure risk over long periods of time. However, even on the long-term, self-reported data is not as crucial as behavioral measurements to accurately predict exposure.



Citations (16)


... The ml model was effective for decision making and also secured the information systems (Abitova & Abalkanov, 2024). Watanabe et al. (2023) aimed to outline the researchers' goal and objectives of the review papers, with the suitable title of the paper being "machine learning in information security". the study's focus was on the different kinds of machine learning like as, how they connect to information security systems and how to detect fraud. ...

Reference:

Leveraging Machine Learning for Cybersecurity: Techniques, Challenges, and Future Directions
Machine Learning Based Prediction of Vulnerability Information Subject to a Security Alert
  • Citing Conference Paper
  • January 2023

... They divided the URLs by the structure of URL protocol, subdomain name, domain name, domain suffix, and URL path 5 parts. The method proposed by Kaneko et al. [10] named "Detecting Malicious Websites by Query Templates"used the machine learning algorithm DBSCAN to cluster malicious URLs and benign URLs. In the segmentation step, they chose a different way to divide URLs is that use all delimiters into URLs. ...

Detecting Malicious Websites by Query Templates
  • Citing Chapter
  • February 2020

Lecture Notes in Computer Science

... Since blacklisting approaches have inherent weaknesses, assessing their effectiveness in real-world scenarios is of extreme importance. In recent years, several rigorous studies have been performed to evaluate and compare malware blacklists [18], [27], [46], and domain blacklists when applied to specific application layer services such as email (for spam and phishing) and web traffic [26], [38], [44]. Surprisingly, despite the availability of a large free-of-charge collection and commercial offerings, there are limited studies on the effectiveness of IP blacklists and many important questions remain unanswered. ...

Empirical Analysis of Domain Blacklists
  • Citing Article
  • January 2020

... Problems With Privacy: Denial of service attacks, data breaches, and data loss are all threats that may affect cloud-based infrastructure [43]. To add insult to injury, cloud data is not as well protected from mandatory disclosure as on-premises information. ...

Optimizing Share Size in Efficient and Robust Secret Sharing Scheme for Big Data
  • Citing Article
  • January 2020

... Several homograph detections have been proposed. While most of the papers focus on IDNs, a state-of-the-art paper [2] can thoroughly deal with the homographs not just in IDNs but also in English domains. Instead of determining the homographs by picking the domains with visual similarity scores greater than a fixed threshold, the authors proposed a machine learning-based classification using the visual similarity as features to address the high false-positive rate caused by the fixed similarity threshold. ...

Hunting Brand Domain Forgery: A Scalable Classification for Homograph Attack
  • Citing Chapter
  • June 2019

IFIP Advances in Information and Communication Technology

... Dit komt deels doordat deze studies meestal gebaseerd zijn op zelf-gerapporteerd online gedrag. Wat mensen beweren online te doen, komt echter niet altijd overeen met wat zij daadwerkelijk online doen (Parry et al. 2021;Wilcockson et al. 2018;Sharif et al. 2018;Van 't Hoff-de Goede et al., 2019). Onderzoeken die gebaseerd zijn op zelf-gerapporteerd gedrag vinden vaak dat groepen die gemiddeld aangeven zich veilig te gedragen online, minder vaak rapporteren slachtoffer te zijn geworden van cybercriminaliteit (Bergmann et al. 2018;Chen et al. 2017). ...

Predicting Impending Exposure to Malicious Content from User Behavior
  • Citing Conference Paper
  • October 2018

... At this time web-based applications can also generally be accessed by using a mobile device. These applications range from game applications, education, shopping for children's games, health, health insurance, and so on (Hölzl et al. 2016;Thao et al. 2018;Yee 2017a;Zhang et al. 2016Zhang et al. , 2015. ). ...

Anonymous and analysable web browsing
  • Citing Conference Paper
  • December 2017

... Hence, we chose to remove them from the relative comparison table as more research work where various machine learning algorithms are used to detect drive-by download attack are still required. We don't know why the detection of drive-by download attacks using machine learning algorithms is so scanty, and so this is open for [1], [81], [26], [12], [83], [55] 88.69 Naive Bayes [56], [1], [47], [103], [77], [92] 87.37 Random Forest [1], [12], [83], [55], [77], [23] 91.83 Decision Tree [1], [40], [60], [55], [103], [34] 91.85 KNN [81], [55], [103], [77], [92], [74] 92.22 Logistic Regression [81], [55], [77], [34], [27], [109] 92 .76 investigation and further research. ...

Classification of Landing and Distribution Domains Using Whois’ Text Mining
  • Citing Conference Paper
  • Full-text available
  • August 2017

... DL generates learning patterns and also generates relationships beyond neighbors. DL not only provides complex representations of data, but it also makes machines independent from human [35]. It extracts useful information (representation, features) from unsupervised data without human intervention. ...

Optimizing Share Size in Efficient and Robust Secret Sharing Scheme for Big Data
  • Citing Article
  • May 2017

IEEE Transactions on Big Data