
Nhien-An Le-Khac- PhD
- Professor (Associate) at University College Dublin
Nhien-An Le-Khac
- PhD
- Professor (Associate) at University College Dublin
Research Lab: https://aseados.ucd.ie
About
338
Publications
246,241
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,532
Citations
Introduction
Dr. Nhien-An Le-Khac is an Associate Professor at the School of Computer Science, University College Dublin. His research interest spans the area of Cybersecurity and Digital Forensics, AI and AI Security, Fraud and Criminal Detection. He has published 3 books with Wiley&Sons, Springer Verlag and more than 200 scientific papers. He is currently the Director of MSc in Forensic Computing and Cybercrime Investigation (http://www.csi.ucd.ie/PostgraduateProgrammes/MSc_FCCI).
Current institution
Publications
Publications (338)
Large Language Models (LLMs) present an opportunity for interpreting and investigating financial cybercrime activity. Analysts are tasked with learning and understanding the ever-shifting domain of financial crime, coupled with the latest tools being produced to assist them in their investigative tasks. We present an LLM XAI approach for identifyin...
In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists' style imitation, raising questions about ethical and security concerns. However, existing surveys have focused on accuracy performance of passive DF detection approaches for single modalities, such as...
The possession of smart devices has ingrained itself into daily life. Therefore, smart devices, such as IoT and smartphones, are crucial sources of evidence in instances where criminal activity occurs. Due to the challenges in traditional digital forensic techniques involving smart devices, it has been recently proposed in the literature to leverag...
The advancements in generative AI have enabled the improvement of audio synthesis models, including text-to-speech and voice conversion. This raises concerns about its potential misuse in social manipulation and political interference, as synthetic speech has become indistinguishable from natural human speech. Several speech-generation programs are...
Various types of sensors can be used for Human Activity Recognition (HAR), and each of them has different strengths and weaknesses. Sometimes a single sensor cannot fully observe the user’s motions from its perspective, which causes wrong predictions. While sensor fusion provides more information for HAR, it comes with many inherent drawbacks like...
Ransomware attacks have increased year-on-year, resulting in organizations of all sizes facing threats to their IT infrastructure and business operations. Companies cannot guarantee absolute protection from a ransomware attack, with social engineering being a significant weakness in an organization's cybersecurity structure. We examine the ransomwa...
The emergence of large language models (LLMs) has revolutionized the field of AI, introducing a new era of generative models applied across diverse use cases. Within this evolving AI application ecosystem, numerous stakeholders, including LLM and AI application service providers, use these models to cater to user needs. A significant challenge aris...
Online dating has grown in popularity since the introduction of the World Wide Web and within the last decade the widespread use of smartphones and dating applications, or ‘apps’. Associated with this popularity is the increased scrutiny around these apps and what companies are doing to protect users’ private information. Dating apps are one of the...
Wireless real-time communication between users is a key function in many types of businesses. With the emergence of digital systems to exchange data between users of the same spectrum, usage of the wireless spectrum is changing and increasing. Long-term frequency band occupancy measurements, carried out in accordance with the requirements of the In...
Investigation on smart devices has become an essential subdomain in digital forensics. The inherent diversity and complexity of smart devices pose a challenge to the extraction of evidence without physically tampering with it, which is often a strict requirement in law enforcement and legal proceedings. Recently, this has led to the
application of...
Many fall detection systems are being used to provide real-time responses to fall occurrences. Automated fall detection is challenging because it requires very high accuracy to be clinically acceptable. Recent research has tried to improve sensitivity while reducing the high rate of false positives. Nevertheless, there are still limitations in term...
Internet of Things (IoT) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT forensics aim to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on thes...
Identifying illicit behavior in the Bitcoin network is a well‐explored topic. The methods proposed over time have generated great insights into the deanonymization of the Bitcoin user base through the clustering of inputs and outputs. With advanced techniques being deployed by Bitcoin users, these heuristics are now being challenged in their abilit...
Today, every aspect of society’s activities is regulated by laws from the national and local levels. The increasing number of legal documents interwoven with each other also leads to difficulties in searching and applying in practice. In addition, later legal documents may change or invalidate previous legal documents. The construction of knowledge...
Illicit activity in cryptocurrency has increased dramatically over the years. Bitcoin mechanics allow for users to mask their identity through obfuscation techniques. Much research has been published in the domain of identifying illicit activity in cryptocurrency, and in particular the emergence of Graph Neural Networks (GNNs) has shown great promi...
Clinical data's vast volume and diverse formats, compounded by the intricacies of healthcare and the heightened importance of security and privacy, necessitate sophisticated data management and governance systems. Existing studies often examine components like data distribution, privacy, and security in isolation. This research bridges the gaps by...
Recently, a new approach to networking called Software-Defined Networking (SDN) has emerged based on the idea of separating the centralized control plane from the data plane, which simplifies network management and meets the needs of modern data centers. However, the centralized nature of SDN also introduces new security risks that could hamper wid...
Financial cybercrime prevention is an increasing issue with many organisations and governments. As deep learning models have progressed to identify illicit activity on various financial and social networks, the explainability behind the model decisions has been lacklustre with the investigative analyst at the heart of any deep learning platform. In...
Mobile payment systems enable users to complete financial transactions with their smartphones, including contactless payments at retail stores. Because the financial transactions of individuals are indicators of their lifestyles, they are potential sources of data in criminal investigations. The items purchased and even the locations of transaction...
IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and ot...
The training programs in digital forensics have contributed many case study models to guide digital forensic analyses. However, they only account for a small number of real cases and they are usually too abstract while actual cybercrime investigations are more diverse and complex. This gap leads to difficulties in giving immediate and straightforwa...
Open Source Intelligence (OSINT) is an important and powerful approach that law enforcement agencies can use to guide an investigation. In many organizations, there are significant barriers to the adoption of effective OSINT techniques, as well as a failure to adapt fast enough to emerging technologies. A cultural change from training duration to a...
This paper introduces a Big Data approach for automatically extracting basement information (i.e. presence of basements and levels) from LiDAR data and digital terrain models. The proposed approach is fast, scalable, and can handle very large amounts of data due to the use of parallel computing. Experimental results showed that the algorithm could...
Many fall detection systems are being used to provide real-time responses to fall occurrences. Automated fall detection is challenging because it requires near perfect accuracy to be clinically acceptable. Recent research has tried to improve the accuracy along with reducing the high rate of false positives. Nevertheless, there are still limitation...
Today, the legal document system is increasingly strict with different levels of influence and affects activities in many different fields. The increasing number of legal documents interwoven with each other also leads to difficulties in searching and applying in practice. The construction of knowledge maps that involve one or a group of legal docu...
Data Warehouse (DW) is a common term used in the Data Mining process for storing copious amounts of data ready for analysis. Organizations are starting to prioritize Data Warehouses, which are essential for mining their historical datasets. In the healthcare industry, the significance of data warehouses escalates due to the critical nature of manag...
Identifying illicit behavior in the Bitcoin network is a well explored topic. The methods proposed over time have generated great insights into the deanonymization of the Bitcoin user base through the clustering of inputs and outputs. With advanced techniques being deployed by Bitcoin users, these heuristics are now being challenged in their abilit...
Within recent years, mobile payment systems have been introduced, allowing smartphone users to complete financial transactions with their smartphones, including contactless payments at retail stores. As a per-son's financial transactions are indicative of their lifestyle, they are a potential source of data for criminal investigations. The items pu...
Industrial control systems (ICS) often contain sensitive information related to the corresponding equipment being controlled and their configurations. Protecting such information is important to both the manufacturers and users of such ICSs. This work demonstrates an attack vector on industrial control systems where information can be exfiltrated t...
Recent machine learning approaches have been effective in Artificial Intelligence (AI) applications. They produce robust results with a high level of accuracy. However, most of these techniques do not provide human-understandable explanations for supporting their results and decisions. They usually act as black boxes, and it is not easy to understa...
Software Defined Networking (SDN) is an emerging network platform, which facilitates centralised network management. The SDN enables the network operators to manage the overall network consistently and holistically, regardless the complexity of infrastructure devices. The promising features of the SDN enhance network security and facilitate the impl...
Reputed organisations are always prompting Data Warehouses (DWs), which are essential for storing and mining their historical datasets. When it comes to the healthcare industry, DWs are becoming ever so imperative, as efficient storage for medical data is vital for one’s health while mining it and seeking new insights. While clinical datasets are v...
This chapter provides a high-level overview on databases including relational databases, database management systems and the structure query language (SQL). A database forensic process is also described, following by some examples of the database investigation. Concepts presented in this chapter are necessary for understanding the later chapters in...
This chapter focuses on the examination of the Signal instant message application. It presents an integrated framework that covers the trigger of investigation to the final report. The framework is an iterative, real-time data flow centred framework. The framework outlines the options to obtaining, monitoring and capturing Signal data flow in real-...
While digital forensics (or referred to as cyber forensics in recent times) play an increasingly important role in our current society (or ‘metaverse’), the role of databases in data/evidence acquisition cannot be understated. Therefore, this edited book focuses on a number of operational challenges in identifying and acquiring data of forensic or...
This chapter focusses on the examination of seized iPhones for the data loss when the extraction is delayed. For law enforcement today, very few crimes are committed without having some nexus to a mobile device, and as such, mobile devices play a critical evidentiary role for investigations. More recently, law enforcement digital forensic labs have...
The use of Internet of Things (IoT) devices in and around the houses has grown enormously in recent years. One of the IoT devices that is being widely used is the smart doorbell also named a video doorbell. Lots of people buy a video doorbell to increase security and or for the prevention. In the context of digital forensics, more video doorbells,...
Internet is widely used in the world and its popularity has grown significantly since 90s of the last century. In 1994, only around 0.04% of the world’s population (~25 million users) had Internet access. By the end of 2021, over 53% (~5 billion users) of the world’s population had access to the Internet, almost 800,000 new users each day (https://...
This chapter focuses on the examination of the qTox message application. Recently, there have been a lot of child exploitation activities where the suspects use amongst other things an E2EE messenger called qTox (using the tox-protocol) for their communication to other offenders. The tox-protocol is an encrypted open source peer-to-peer network pro...
This chapter focuses on the examination of PyBitmessage Messenger. The Bitmessage protocol, which the PyBitmessage Messenger is based on has been developed with the aspiration of preventing all of the investigations approaches. Also, cybercriminals have become aware of this messaging protocol. This chapter describes how to gather pieces of informat...
Current state-of-the-art point cloud data management (PCDM) systems rely on a variety of parallel architectures and diverse data models. The main objective of these implementations is achieving higher scalability without compromising performance. This paper reviews the scalability and performance of state-of-the-art PCDM systems with respect to bot...
State-of-the-art remote sensing image management systems adopt scalable databases and employ sophisticated indexing techniques to perform window and containment queries. Many rely on space-filling curve (SFC) based index techniques designed for key-value databases and are predominantly employable for images that are iso-oriented. Critically, these...
InSDN dataset
Elsayed, Mahmoud Said, Nhien-An Le-Khac, and Anca D. Jurcut. "InSDN: A novel SDN intrusion dataset." IEEE Access 8 (2020): 165263-165284.
Recent machine learning approaches have been effective in Artificial Intelligence (AI) applications. They produce robust results with a high level of accuracy. However, most of these techniques do not provide human-understandable explanations for supporting their results and decisions. They usually act as black boxes, and it is not easy to understa...
Deploying robust machine learning models has to account for concept drifts arising due to the dynamically changing and non-stationary nature of data. Addressing drifts is particularly imperative in the security domain due to the ever-evolving threat landscape and lack of sufficiently labeled training data at the deployment time leading to performan...
In recent years, data science has evolved significantly. Data analysis and mining processes become routines in all sectors of the economy where datasets are available. Vast data repositories have been collected, curated, stored, and used for extracting knowledge. And this is becoming commonplace. Subsequently, we extract a large amount of knowledge...
In recent years, data science has evolved significantly. Data analysis and mining processes become routines in all sectors of the economy where datasets are available. Vast data repositories have been collected, curated, stored, and used for extracting knowledge. And this is becoming commonplace. Subsequently, we extract a large amount of knowledge...
Deploying robust machine learning models has to account for concept drifts arising due to the dynamically changing and non-stationary nature of data. Addressing drifts is particularly imperative in the security domain due to the ever-evolving threat landscape and lack of sufficiently labeled training data at the deployment time leading to performan...
Cryptocurrency has been (ab)used to purchase illicit goods and services such as drugs, weapons and child pornography (also referred to as child sexual abuse materials), and thus mobile devices (where cryptocurrency wallet applications are installed) are a potential source of evidence in a criminal investigation. Not surprisingly, there has been inc...
In an Internet of Things (IoT) environment, IoT devices are typically connected through different network media types such as mobile, wireless and wired networks. Due to the pervasive nature of such devices, they are a potential evidence source in both civil litigation and criminal investigations. It is, however, challenging to identify and acquire...
Acquisition of non-volatile or volatile memory is traditionally the first step in the forensic process. This approach has been widely used in mobile device investigations. However, with the advance of encryption techniques applied by default in mobile operating systems, data access is more restrictive. Investigators normally do not have administrat...
Machine Learning and Deep Learning methods are widely adopted across financial domains to support trading activities, mobile banking, payments, and making customer credit decisions. These methods also play a vital role in combating financial crime, fraud, and cyberattacks. Financial crime is increasingly being committed over cyberspace, and cybercr...
Acquisition of non-volatile or volatile memory is a popular approach in the forensic process as a first step of data acquisition. This approach has been widely used in mobile device investigations. However, with the advance of encryption techniques applied by default in mobile operating systems, data access is more restrictive. Investigators normal...
In digital agriculture, agronomists are required to make timely, profitable and more actionable precise decisions based on knowledge and experience. The input can be cultivated and related agricultural data, and one of them is text data, including news articles, business news, policy documents, or farming notes. To process this kind of data, identi...
Internet of Things (IoT) is becoming the new frontier in digital forensics due to the abundance of IoT devices appearing in day-today life. The diversity and complexity of IoT ecosystems pose a considerable challenge to digital investigators that demand novel approaches. Electromagnetic side-channel analysis (EM-SCA) has been proposed as a promisin...
Point density is an important property that dictates the usability of a point cloud data set. This paper introduces an efficient, scalable, parallel algorithm for computing the local point density index, a sophisticated point cloud density metric. Computing the local point density index is non-trivial, because this computation involves a neighbour...
Machine Learning methods are playing a vital role in combating ever-evolving threats in the cybersecurity domain. Explanation methods that shed light on the decision process of black-box classifiers are one of the biggest drivers in the successful adoption of these models. Explaining predictions that address ‘Why?/Why Not?’ questions help users/sta...
Background and Objective: Cloud computing has the ability to offload processing tasks to a remote computing resources. Presently, the majority of biomedical digital signal processing involves a ground-up approach by writing code in a variety of languages. This may reduce the time a researcher or health professional has to process data, while increa...
This is a comprehensive Electromagnetic side-channel dataset representing a diverse collection of popular IoT devices and smartphones. The presented dataset is used to demonstrate the potential usage of machine learning models to recognise device behaviour.
A total of 8 main smart device types were used for the creation of the dataset, including sm...
The MITRE Corporation is an American non-profit organization that has made substantial efforts into creating and maintaining knowledge bases relevant to cybersecurity and has been widely adopted by the community. ATT&CK ”Adversarial Tactics, Techniques, and Common Knowledge” is a popular taxonomy by MITRE, which describes threat actor behaviors. Te...
Software-Defined Networking (SDN) is a promising technology for the future Internet. However, the SDN paradigm introduces new attack vectors that do not exist in the conventional distributed networks. This paper develops a hybrid Intrusion Detection System (IDS) by combining the Convolutional Neural Network (CNN) and Long Short-Term Memory Network...
In a previous work, a clustering-based method had been incorporated with the latent feature space of an autoencoder to discover sub-classes of normal data for anomaly detection. However, the work has the limitation in manually setting up the numbers of clusters in the normal training data. Finding a proper number of clusters in datasets is often am...
Electromagnetic (EM) side-channel radiation from Internet of Things (IoT) devices are shown to be effective at acquiring forensic insights during digital investigations. These EM radiation patterns can be analysed with the help of machine learning algorithms to detect internal behaviours of IoT devices, which can be relevant to an investigation. Ho...
With the rapid increase in mobile phone storage capacity and penetration, digital forensic investigators face a significant challenge in quickly identifying relevant examinable files within a plethora of uninteresting OS and application files extracted by forensic tools. This challenge can have serious adverse effects in time critical cases, and ca...
Software-defined networking (SDN) is a new networking paradigm that separates the controller from the network devices i.e. routers and switches. The centralised architecture of the SDN facilitates the overall network management and addresses the requirement of current data centres. While there are high benefits offered by the SDN architecture, the...
Recent growth in domain specific applications of machine learning can be attributed to availability of realistic public datasets. Real world datasets may always contain sensitive information about the users, which makes it hard to share freely with other stake holders,
and researchers due to regulatory and compliance requirements. Synthesising dat...
Recent growth in domain specific applications of machine learning can be attributed to availability of realistic public datasets. Real world datasets may always contain sensitive information about the users, which makes it hard to share freely with other stake holders, and researchers due to regulatory and compliance requirements. Synthesising data...
Social media is a cybersecurity risk for every business. What do people share on the Internet? Almost everything about oneself is shared: friendship, demographics, family, activities and work-related information. This could become a potential risk in every business if the organisation’s policies, training and technology fail to properly address the...
On a daily basis, law enforcement officers struggle with suspects using mobile communication applications for criminal activities. These mobile applications replaced SMS-messaging and evolved the last few years from plain-text data transmission and storage to an encrypted version. Regardless of the benefits for all law abiding citizens, this is con...
Automated facial age estimation has drawn increasing attention in recent years. Several applications relevant to digital forensic investigations include the identification of victims, suspects and missing children, and the decrease of investigators' exposure to psychologically impacting material. Nevertheless, due to the lack of accurately labelled...
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairl...
Social Media is a cyber-security risk for every business. What do people share on the Internet? Almost everything about oneself is shared: friendship, demographics, family, activities, and work-related information. This could become a potential risk in every business if the organization's policies, training and technology fail to properly address t...
Instant messaging (IM) has been around for decades now. Over the last few decades IM has become more and more popular with varied protocols, both open source and closed source. One of the new recent open source ones is the Matrix protocol with the first stable version released in 2019 and the IM application based on this protocol is "Riot.im". In r...
File type identification (FTI) has become a major discipline for anti-virus developers, firewall designers and for forensic cybercrime investigators. Over the past few years, research has seen the introduction of several classifiers and features. One of these advances is the so-called n-grams analysis, which is an interpretation of statistical coun...
InSDN is a comprehensive Software-Defined Network (SDN) dataset for Intrusion detection system evaluation. The new dataset includes the benign and various attack categories that can occur in different elements of the SDN standard. InSDN considers different attack, including DoS, DDoS, brute force attack, web applications, exploitation, probe, and b...
Kodi is of one of the world’s largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual...
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairl...
The increasing use of smartphones has increased their presence in legal and corporate investigations. Unlike desktop and laptop computers, forensic analysis of smartphones is a challenging task due to their limited interfaces to retrieve information of forensic value. Electromagnetic side-channel analysis (EM-SCA) has been recently proposed as an a...
In an Internet of Things (IoT) environment, IoT devices are typically connected through different network media types such as mobile, WiFi and wired networks. Due to the pervasive nature of such devices, they are a potential evidence source in both civil litigation and criminal investigations. It is, however, challenging to identify and acquire for...
The novel severe acute respiratory syndrome coronavirus 2 and its associated disease, COVID-19, have increased the amount of time that people spend working from home and in social isolation. In 2020, the number of users worldwide who relied on the Internet for work, education, and entertainment increased significantly. This growth is causing a subs...
Today traditional communication methods, such as SMS or phone calls, are used less often and are replaced by the use of chat applications. WhatsApp is one of the most popular chat applications nowadays. WhatsApp offers different ways of communicating, which include sending text messages and making phone calls. The implementation of encryption makes...
Although tools for tracking and monitoring illegal networks have been developed for centuries, current methods available at the moment still need continues improvement. This is due to the fact that tracking and monitoring illegal networks in the cyberspace has become increasingly challenging for law enforcement agencies due to sophisticated encrypt...
In many organisations there are up to 15 security controls that help defenders accurately identify and prioritise information security risks. Due to the lack of clarity into the effectiveness and capabilities of these defences, and poor visibility to overall risk posture has led to a crisis of prioritisation. Lately, organisations rely on scenario...
Kodi is of one of the world's largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual...