Kensuke Fukuda’s research while affiliated with The Graduate University for Advanced Studies, SOKENDAI and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (193)


GothX: a generator of customizable, legitimate and malicious IoT network traffic
  • Conference Paper

August 2024

·

11 Reads

·

1 Citation

Manuel Poisson

·

·

Kensuke Fukuda

Extended features from Gotham to GothX
Customizable topology and scenario parameters
Attack types and corresponding tools in MQTTset
GothX: a generator of customizable, legitimate and malicious IoT network traffic
  • Preprint
  • File available

July 2024

·

8 Reads

In recent years, machine learning-based anomaly detection (AD) has become an important measure against security threats from Internet of Things (IoT) networks. Machine learning (ML) models for network traffic AD require datasets to be trained, evaluated and compared. Due to the necessity of realistic and up-to-date representation of IoT security threats, new datasets need to be constantly generated to train relevant AD models. Since most traffic generation setups are developed considering only the author's use, replication of traffic generation becomes an additional challenge to the creation and maintenance of useful datasets. In this work, we propose GothX, a flexible traffic generator to create both legitimate and malicious traffic for IoT datasets. As a fork of Gotham Testbed, GothX is developed with five requirements: 1)easy configuration of network topology, 2) customization of traffic parameters, 3) automatic execution of legitimate and attack scenarios, 4) IoT network heterogeneity (the current iteration supports MQTT, Kafka and SINETStream services), and 5) automatic labeling of generated datasets. GothX is validated by two use cases: a) re-generation and enrichment of traffic from the IoT dataset MQTTset,and b) automatic execution of a new realistic scenario including the exploitation of a CVE specific to the Kafka-MQTT network topology and leading to a DDoS attack. We also contribute with two datasets containing mixed traffic, one made from the enriched MQTTset traffic and another from the attack scenario. We evaluated the scalability of GothX (450 IoT sensors in a single machine), the replication of the use cases and the validity of the generated datasets, confirming the ability of GothX to improve the current state-of-the-art of network traffic generation.

Download

Exploring the Discovery Process of Fresh IPv6 Prefixes: An Analysis of Scanning Behavior in Darknet and Honeynet

March 2024

·

6 Reads

Lecture Notes in Computer Science

Internet-wide scanners can efficiently scan the expansive IPv6 network by targeting the active prefixes and responsive addresses on the hitlists. However, it is not clear enough how scanners discover fresh prefixes, which include newly assigned or deployed prefixes, as well as previously unused ones. This paper studies the whole discovery process of fresh prefixes by scanners. We implement four DNS-based address-exposing methods, analyze the arrival sequence of scans from distinct ASes, and examine the temporal and spatial scan patterns, with darknet and honeynet. Over six months, our custom-made darknet and probabilistic responsive honeynet collected 33 M packets (1.8 M sessions) of scans from 116 distinct ASes and 18.8 K unique source IP addresses. We investigate the whole process of fresh prefix discovery, including address-exposing, initial probing, hitlist registration, and large-scale scan campaigns. Furthermore, we analyze the difference in scanning behavior by ASes, and categorize the scanners into three types, honeynet-exclusive, honeynet-predominant and balanced, based on the respective ratio of scans to darknet and honeynet. Besides, we analyze the intentions of scanners, such as network reconnaissance or scanning responsive targets, and the methods they used to obtain potential targets, such as by sending DNS queries or using public hitlist. These findings bring insights into the process of fresh prefixes attracting scanners and highlight the vital role of responsive honeynet in analyzing scanner behaviors.


Following the Data Trail: An Analysis of IXP Dependencies

March 2024

·

11 Reads

·

1 Citation

Lecture Notes in Computer Science

Internet exchange points (IXPs) play a vital role in the modern Internet. Envisioned as a means to connect physically close networks, they have grown into large hubs connecting networks from all over the world, either directly or via remote peering. It is therefore important to understand the real footprint of an IXP to quantify the extent to which problems (e.g., outages) at an IXP can impact the surrounding Internet topology. An IXP footprint computed only from its list of members as given by PeeringDB, or the IXP’s website, is usually depicting an incomplete view of the IXP as it misses downstream networks whose traffic may transit via an IXP although they are not directly peering there. In this paper we propose a robust approach that uncovers this dependency using traceroute data from two large measurement platforms. Our approach converts traceroutes to paths that include both autonomous systems (ASes) and IXPs and computes AS Hegemony to infer their inter-dependencies. This technique discovers thousands of dependent networks not directly connected to IXPs and emphasizes the role of IXPs in the Internet topology. We also look at the geolocation of members and dependents and find that only 3%{3}{\%} of IXPs with dependents are entirely local: all members and dependents are in the same country as the IXP. Another 52%{52}{\%} connect international members, but only have domestic dependents.



FIGURE 1: Online Behavioral Advertising explanation
FIGURE 5: Distribution of unique third-party tracking cookie hosts (tracker domains)
FIGURE 6: Breakdown Number of OBA ads by ad-serving domain
Census measurement configurations.
Top 10 tracker domains sorted by the total number of cookies involved in OBA Run
Do Cookie Banners Respect My Browsing Privacy? Measuring the Effectiveness of Cookie Rejection for Limiting Behavioral Advertising

January 2024

·

2 Reads

IEEE Access

Online behavioral advertising (OBA) is a method within digital advertising that exploits web users' interests to tailor ads. Its use has raised privacy concerns among researchers, regulators, and the media, emphasizing the need for a reliable mechanism to measure its prevalence. However, there is a lack of systematic research on how user consent choices affect OBA presence, and no open-source frameworks exist for large-scale automated OBA measurement. To address this, we design and implement OpenOBA , a new framework for automated OBA discovery on the web. OpenOBA is a general, modular, and scalable framework to support essentially any OBA measurement.With it, we conduct a study to measure the impact of three user consent choices for cookies on OBA, uncovering a complex online privacy landscape. We first confirm the presence of OBA by comparing the increased likelihood of encountering ads from a specific topic, i.e., Style & Fashion , when browsing with an artificially induced behavior versus when browsing without any particular behavior. Then, we find that the Accept All choice significantly raises the number of OBA ads. For the Reject All option, on the other hand, we observe that it reduces the number of unique third-party tracking cookie hosts (tracker domains) by around 70%, yet it still shows ads related to the user's interests. Notably, we also find that OBA ads are only served through Google-related domains across the three banner interaction configurations used, despite the involvement of up to 191 different tracker domains in the Accept All configuration. This underscores the dominant role of major players in the OBA ad market. Finally, to foster reproducibility and further research, we open-sourced our framework and released all data and analysis scripts.





I Never Trust My University for This! Investigating Student PII Leakage at Vietnamese Universities

December 2023

·

22 Reads

IEICE Transactions on Information and Systems

Universities collect and process a massive amount of Personal Identifiable Information (PII) at registration and throughout interactions with individuals. However, student PII can be exposed to the public by uploading documents along with university notice without consent and awareness, which could put individuals at risk of a variety of different scams, such as identity theft, fraud, or phishing. In this paper, we perform an in-depth analysis of student PII leakage at Vietnamese universities. To the best of our knowledge, we are the first to conduct a comprehensive study on student PII leakage in higher educational institutions. We find that 52.8% of Vietnamese universities leak student PII, including one or more types of personal data, in documents on their websites. It is important to note that the compromised PII includes sensitive types of data, student medical record and religion. Also, student PII leakage is not a new phenomenon and it has happened year after year since 2005. Finally, we present a study with 23 Vietnamese university employees who have worked on student PII to get a deeper understanding of this situation and envisage concrete solutions. The results are entirely surprising: the employees are highly aware of the concept of student PII. However, student PII leakage still happens due to their working habits or the lack of a management system and regulation. Therefore, the Vietnamese university should take a more active stand to protect student data in this situation.


Citations (80)


... There exists a plethora of works that create datasets for training generative models with different purposes and that have been cited previously in the above sections. However the aim of [143] is to create datasets of malicious traffic that can be used later in other applications. Thus, the authors propose GothX, a flexible traffic generator to create both datasets of legitimate and malicious IoT traffic. ...

Reference:

A Comprehensive Survey on Generative AI Solutions in IoT Security
GothX: a generator of customizable, legitimate and malicious IoT network traffic
  • Citing Conference Paper
  • August 2024

... The current research article covers the integration of XAI methods in various network environments, from IoT to mobile and standard network infrastructure, in the context of XAI applied to anomaly detection. [159] evaluates different XAI algorithms integrated in Anomaly Detection (AD) for IoT traffic. The overall reliability and applicability of AD in real-world scenarios are improved by bridging the gap between high-performance models and understandable decisions. ...

Evaluation of XAI Algorithms in IoT Traffic Anomaly Detection

... By applying the Synthetic Minority Over-sampling Technique (SMOTE) they were able to balance the data set and create a reliable IDS, which used deep learning in combination with transfer learning for high reliability of detection. Lahesoo et al. [23] A scalable framework, SIURU (Scalable and Interactive Unsupervised intelligent Recognizer for Unveillance): this is a flexible and adaptable way to carry out anomaly detection in IoT network traffic by utilising machine learning models. While the framework was shown to be effective with explainable Artificial Intelligence (XAI) algorithms on many datasets, it performed narrowly worse for identifying new anomalies accurately. ...

SIURU: A Framework for Machine Learning Based Anomaly Detection in IoT Network Traffic

... arty platform's single-sign on (SSO) and OAuth which limit the need for resharing personal information (Sadqi, Belfaik, & Safi, 2020), there are some security and privacy concerns using SSO (Karie, et. al., 2020). The information shared to the platform using SSO is not in the user's control, using additional trackers attached to the user's session (Pham et. al., 2023). ...

SSOLogin: A framework for automated web privacy measurement with SSO logins
  • Citing Conference Paper
  • December 2023

... Castell-Uroz et al. in [11] quantify in more than 95% of the total the number of websites that use fingerprinting or similar transparent tracking methods online. Although the research community has presented many works to automatically find and block this kind of web tracking by multiple means (e.g., [12]- [22]), the difficulty of adoption of those techniques makes this still an open problem. ...

ASTrack: Automatic Detection and Removal of Web Tracking Code with Minimal Functionality Loss
  • Citing Conference Paper
  • May 2023

... When an IPv4 host tries to connect to a server, it first sends a DNS type A query packet. This is changed to type AAAA query by the NAT 46 router [9] . NAT 46 pulls the IPv6 address from the answer packet when it receives the query response. ...

Characterizing DNS query response sizes through active and passive measurements
  • Citing Conference Paper
  • April 2022

... In the domain of log analysis on distributed systems, Neves et al. [92] present Horus, a tool that uses graph databases to store causal graphs obtained from causally consistent aggregation of distributed systems. More specifically on the analysis of network events logs, we can cite Kobayashi et al. [93], which presents a methodology for the comparative analysis of two networks using causal graphs obtained from log data. The study focused on comparing logs from different sources by reconciling the generated causal graphs. ...

Comparative Causal Analysis of Network Log Data in Two Large ISPs
  • Citing Conference Paper
  • April 2022

Satoru Kobayashi

·

·

Kenjiro Cho

·

[...]

·

Kensuke Fukuda

... Given the 24/7 real-time nature of networked services, the content of log files grows on a continuous basis and very quickly exceeds the ability to perform manual analysis. This has led to the development of log abstraction tools that attempt to summarize/parse the original log content into a form that might be appropriate for making specific decisions (e.g., earlier studies [1][2][3][4]. ...

amulog: A general log analysis framework for comparison and combination of diverse template generation methods*
  • Citing Article
  • December 2021

International Journal of Network Management

... With both the victim's username and password, along with the stolen or forged 2FA cookies, the attacker can bypass the 2FA, thus gaining direct access to the victim's account. Once inside, the attacker can take control of the account, potentially establishing persistent access for continued exploitation [30]. ...

Alternative to third-party cookies: investigating persistent PII leakage-based web tracking
  • Citing Conference Paper
  • December 2021

... The study focused on comparing logs from different sources by reconciling the generated causal graphs. In parallel, Jarry et al. [94] apply MixedLinGaM to the problem, generating weighted graphs with more causal relationships than the baseline given in the previous study. Regarding the use of causal analysis in network management, we can highlight Kim et al. [95], which proposes an algorithm for Alarm Correlation based on root cause analysis without the need for domain knowledge. ...

A Quantitative Causal Analysis for Network Log Data
  • Citing Conference Paper
  • July 2021