Surya Nepal

Surya Nepal
The Commonwealth Scientific and Industrial Research Organisation | CSIRO · Division of Computational Informatics

PhD

About

476
Publications
112,216
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,043
Citations
Citations since 2016
270 Research Items
7347 Citations
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500

Publications

Publications (476)
Chapter
Software vulnerabilities are becoming increasingly severe problems, which can pose great risks of information leakage, denial of service, or even system crashes. However, their detection is still formidable, due to the diverse forms of software development and the diverse programming styles of software developers. In this paper, we propose a vulner...
Chapter
Spear Phishing is one of the most difficult to detect cyber attacks facing businesses and individuals worldwide. In recent years, considerable research has been conducted into the use of Machine Learning (ML) techniques for spear-phishing detection. ML-based solutions are vulnerable to zero-day attacks, as when the algorithms do not have access to...
Article
The diversity and quantity of data warehouses, gathering data from distributed devices such as mobile devices, can enhance the success and robustness of machine learning algorithms. Federated learning enables distributed participants to collaboratively learn a commonly shared model while holding data locally. However, it is also faced with expensiv...
Preprint
Training highly performant deep neural networks (DNNs) typically requires the collection of a massive dataset and the use of powerful computing resources. Therefore, unauthorized redistribution of private pre-trained DNNs may cause severe economic loss for model owners. For protecting the ownership of DNN models, DNN watermarking schemes have been...
Article
Machine learning (ML) techniques are becoming more and more important in cybersecurity, as they can quickly analyse and identify different types of threats from millions of events. In spite of the increasing number of possible applications of machine learning, successful adoption of ML models in cybersecurity still highly relies on the explainabili...
Preprint
Location trajectories collected by smartphones and other devices represent a valuable data source for applications such as location-based services. Likewise, trajectories have the potential to reveal sensitive information about individuals, e.g., religious beliefs or sexual orientations. Accordingly, trajectory datasets require appropriate sanitiza...
Chapter
Advanced adversarial attacks such as membership inference and model memorization can make federated learning (FL) vulnerable and potentially leak sensitive private data. Local differentially private (LDP) approaches are gaining more popularity due to stronger privacy notions and native support for data distribution compared to other differentially...
Article
Full-text available
Collaborative inference has recently emerged as an attractive framework for applying deep learning to Internet of Things (IoT) applications by splitting a DNN model into several subpart models among resource-constrained IoT devices and the cloud. However, the reconstruction attack was proposed recently to recover the original input image from inter...
Article
Graph representation learning aims at mapping a graph into a lower-dimensional feature space. Deep attributed graph representation, utilizing deep learning models on the graph structure and attributes, shows its significance in mining complex relational data. Most existing deep attributed graph representation models assume graph attributes in a sin...
Preprint
Full-text available
The daily deluge of alerts is a sombre reality for Security Operations Centre (SOC) personnel worldwide. They are at the forefront of an organisation's cybersecurity infrastructure, and face the unenviable task of prioritising threats amongst a flood of abstruse alerts triggered by their Security Information and Event Management (SIEM) systems. URL...
Preprint
Email phishing has become more prevalent and grows more sophisticated over time. To combat this rise, many machine learning (ML) algorithms for detecting phishing emails have been developed. However, due to the limited email data sets on which these algorithms train, they are not adept at recognising varied attacks and, thus, suffer from concept dr...
Preprint
Full-text available
Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for...
Article
Full-text available
The bulk of Internet interactions is highly redundant and also security sensitive. To reduce communication bandwidth and provide a desired level of security, a data stream is first compressed to squeeze out redundant bits and then encrypted using authenticated encryption. This generic solution is very flexible and works well for any pair of (compre...
Article
Multi-turn response selection is a key issue in retrieval-based chatbots and has attracted considerable attention in the NLP (Natural Language processing) field. So far, researchers have developed many solutions that can select appropriate responses for multi-turn conversations. However, these works are still suffering from the semantic mismatch pr...
Article
Computer users are generally faced with difficulties in making correct security decisions. While an increasingly fewer number of people are trying or willing to take formal security training, online sources including news, security blogs, and websites are continuously making security knowledge more accessible. Analysis of cybersecurity texts from t...
Article
Edge computing, as a relatively recent evolution of cloud computing architecture, is the newest way for enterprises to distribute computational power and lower repetitive referrals to central authorities. In the edge computing environment, Generative Models (GMs) have been found to be valuable and useful in machine learning tasks such as data augme...
Article
Driven by the cloud-first initiative taken by various governments and companies, it has become a common practice to outsource spatial data to cloud servers for a wide range of applications such as location-based services and geographic information systems. Searchable encryption is a common practice for outsourcing spatial data which enables search...
Preprint
Full-text available
The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the closeness of natural languages to the high-level programming language such as C/C++, this work studies how good are the large transformer-based language models detecting software vulnerabilities. Our results demonstrate t...
Preprint
Web phishing remains a serious cyber threat responsible for most data breaches. Machine Learning (ML)-based anti-phishing detectors are seen as an effective countermeasure, and are increasingly adopted by web-browsers and software products. However, with an average of 10K phishing links reported per hour to platforms such as PhishTank and VirusTota...
Preprint
Cloud-enabled Machine Learning as a Service (MLaaS) has shown enormous promise to transform how deep learning models are developed and deployed. Nonetheless, there is a potential risk associated with the use of such services since a malicious party can modify them to achieve an adverse result. Therefore, it is imperative for model owners, service p...
Data
Supplementary materials for the TNNLS paper titled "A Comprehensive Survey on Community Detection with Deep Learning".
Preprint
Full-text available
Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (T...
Preprint
Full-text available
Each and every organisation releases information in a variety of forms ranging from annual reports to legal proceedings. Such documents may contain sensitive information and releasing them openly may lead to the leakage of confidential information. Detection of sentences that contain sensitive information in documents can help organisations prevent...
Article
Full-text available
Detecting a community in a network is a matter of discerning the distinct features and connections of a group of members that are different from those in other communities. The ability to do this is of great significance in network analysis. However, beyond the classic spectral clustering and statistical inference methods, there have been significa...
Article
Full-text available
In e-commerce scenarios, frauds events such as telecom fraud, insurance fraud, and fraudulent transactions, bring a huge amount of loss to merchants or users. Identification of fraudsters helps regulators take measures for targeted control. Given a set of fraudsters and suspicious users observed from victims’ reports, how can we effectively disting...
Preprint
Rowhammer has drawn much attention from both academia and industry in the last few years as rowhammer exploitation poses severe consequences to system security. Since the first comprehensive study of rowhammer in 2014, a number of rowhammer attacks have been demonstrated against ubiquitous dynamic random access memory (DRAM)-based commodity systems...
Article
Genome-wide analysis has demonstrated both health and social benefits. However, large scale sharing of such data may reveal sensitive information about individuals. One of the emerging challenges is identity tracing attack that exploits correlations among genomic data to reveal the identity of DNA samples. In this paper, we first demonstrate that t...
Article
Outsourcing decision tree inference services to the cloud is highly beneficial, yet raises critical privacy concerns on the proprietary decision tree of the model provider and the private input data of the client. In this paper, we design, implement, and evaluate a new system that allows highly efficient outsourcing of decision tree inference. Our...
Article
Detection-based defense approaches are effective against adversarial attacks without compromising the structure of the protected model. However, they could be bypassed by stronger adversarial attacks and are limited in their ability to handle high-fidelity images. In this paper, we explore an effective detection-based defense against adversarial at...
Article
Tracking the evolution of clusters in social media streams is becoming increasingly important for many applications, such as early detection and monitoring of natural disasters or pandemics. In contrast to clustering on a static set of data, streaming data clustering does not have a global view of the complete data. The local (or partial) view in a...
Article
The healthcare Internet of Things (IoT) is rapidly becoming an invaluable tool in the healthcare industry. However, sharing data in healthcare IoT raises many security and privacy concerns, such as how to ensure data integrity, source authentication, and data privacy. Redactable signature schemes ( ${\sf RSS}$ s) could be a feasible solution to ad...
Article
Rowhammer is a hardware vulnerability in DRAM memory, where repeated access to hammer rows can induce bit flips in neighboring victim rows . Being a hardware vulnerability, rowhammer bypasses all the system memory protection, allowing adversaries to compromise the integrity and confidentiality of data. Rowhammer attacks have shown to enable pri...
Article
Full-text available
Building hardware security primitives with on-device memory fingerprints is a compelling proposition given the ubiquity of memory in electronic devices, especially for low-end Internet of Things devices for which cryptographic modules are often unavailable. However, the use of fingerprints in security functions is challenged by the small, but unpre...
Chapter
People have personal and/or business need to share private and confidential documents; however, often at the expense of privacy. Privacy aware users demand that their data is secure during the entire life cycle, and not residing in clouds indefinitely. A trending feature in industry is to set download constraints of shared files - a file can be dow...
Article
Full-text available
Federated learning (FL) and split learning (SL) are state-of-the-art distributed machine learning techniques to enable machine learning without accessing raw data on clients or end devices. However, their comparative training performance under real-world resource-restricted Internet of Things (IoT) device settings, e.g., Raspberry Pi, remains barel...
Article
Wireless medical sensor networks (WMSNs) have aroused widespread attention in recent years with the development of Internet of Things (IoT) technology. WMSNs offer many new opportunities for healthcare professionals to monitor patients and patient self-monitoring. To overcome the resource (such as memory and power) limitations of sensors and attain...
Preprint
Full-text available
Cyber deception is emerging as a promising approach to defending networks and systems against attackers and data thieves. However, despite being relatively cheap to deploy, the generation of realistic content at scale is very costly, due to the fact that rich, interactive deceptive technologies are largely hand-crafted. With recent improvements in...
Article
Cloud storage systems have seen a growing number of clients due to the fact that more and more businesses and governments are shifting away from in-house data servers and seeking cost-effective and ease-of-access solutions. However, the security of cloud storage is underestimated in current practice, which resulted in many large-scale data breaches...
Article
Meltdown released in 2018 is a hardware vulnerability primarily affecting Intel modern processors. It allows a rogue process to read the kernel data in CPU L1D cache. To defend against the Meltdown attack in legacy processors, the most effective software-only mitigation approach is to unmap kernel memory from user processes, known as kernel page-ta...
Preprint
Outsourcing decision tree inference services to the cloud is highly beneficial, yet raises critical privacy concerns on the proprietary decision tree of the model provider and the private input data of the client. In this paper, we design, implement, and evaluate a new system that allows highly efficient outsourcing of decision tree inference. Our...
Chapter
Driven by the cloud-first initiative taken by various governments and companies, it has become a common practice to outsource spatial data to cloud servers for a wide range of applications such as location-based services and geographic information systems. Searchable encryption is a common practice for outsourcing spatial data which enables search...
Article
Full-text available
Passwords are regarded as the most common authentication mechanism used by Web-based services, despite large-scale attacks and data breaches regularly exploiting password-associated vulnerabilities. We investigate the trends behind password formulation in an exploratory study to postulate that social identity and language play a major role in users...
Chapter
Because the recent ransomware families are becoming progressively more advanced, it is challenging to detect ransomware using static features only. However, their behaviors are still more generic and universal to analyze due to their inherent goals and functions. Therefore, we can capture their behaviors by monitoring their system-level activities...
Preprint
Full-text available
Given the ubiquity of memory in commodity electronic devices, fingerprinting memory is a compelling proposition, especially for low-end Internet of Things (IoT) devices where cryptographic modules are often unavailable. However, the use of fingerprints in security functions is challenged by the inexact reproductions of fingerprints from the same de...
Article
Serendipity of Internet of Things (ioT) services will lead to highly innovative applications, including the crowdsharing of a wide array of services such as wireless energy services and other digital services. The service paradigm lends itself nicely to the modeling of, and delivering on IoT. Each ‘thing’ is modeled as a service with a set of purpo...
Preprint
Full-text available
URLs are central to a myriad of cyber-security threats, from phishing to the distribution of malware. Their inherent ease of use and familiarity is continuously abused by attackers to evade defences and deceive end-users. Seemingly dissimilar URLs are being used in an organized way to perform phishing attacks and distribute malware. We refer to suc...
Conference Paper
Full-text available
Digital twin technology today is diverse and emerging and its full potential is not yet widely understood. The concept of a digital twin allows for the analysis, design, opti-misation and evolution of systems to take place fully digital, or in conjunction with a cyber-physical system to improve speed, accuracy and efficiency when compared to tradit...
Article
Over the past decade, the Internet of Things (IoT) is widely adopted in various domains, including education, commerce, government, and healthcare. There are also many IoT based applications drawn significantly attentions in recent years. With the increasing numbers of the connected devices in IoT system, one of the challenging tasks is to ensure d...
Article
Full-text available
Compression is widely used in Internet applications to save communication time, bandwidth and storage. Recently invented by Jarek Duda asymmetric numeral system (ANS) offers an improved efficiency and a close to optimal compression. The ANS algorithm has been deployed by major IT companies such as Facebook, Google and Apple. Compression by itself d...
Article
Machine learning models have demonstrated vulnerability to adversarial attacks, more specifically misclassification of adversarial examples. In this paper, we propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples in an interpretable and efficient manner. The intuition is that the classific...
Chapter
Dynamic searchable symmetric encryption (DSSE) can enable a cloud server to search and update over the encrypted data. Recently, forward and backward privacy in DSSE receive wide attention due to the rise in a number of emerging attacks exploiting the leakage in data update operations. Forward privacy ensures newly added data is not related to quer...
Preprint
Full-text available
An integrated clinical environment (ICE) enables the connection and coordination of the internet of medical things around the care of patients in hospitals. However, ransomware attacks and their spread on hospital infrastructures, including ICE, are rising. Often the adversaries are targeting multiple hospitals with the same ransomware attacks. The...
Chapter
Ensuring cost-effective end-to-end QoS in an IoT data processing pipeline (DPP) is a non-trivial task. A key factor that affects the overall performance is the amount of computing resources allocated to each service in the pipeline. In this demo paper, we present AuraEN, an Autonomous resource allocation ENgine that can proactively scale the resour...
Preprint
Full-text available
A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for communit...
Preprint
Full-text available
Spear Phishing is a harmful cyber-attack facing business and individuals worldwide. Considerable research has been conducted recently into the use of Machine Learning (ML) techniques to detect spear-phishing emails. ML-based solutions may suffer from zero-day attacks; unseen attacks unaccounted for in the training data. As new attacks emerge, class...
Preprint
Full-text available
The proliferation of Internet of Things (IoT) devices has made people's lives more convenient, but it has also raised many security concerns. Due to the difficulty of obtaining and emulating IoT firmware, the black-box fuzzing of IoT devices has become a viable option. However, existing black-box fuzzers cannot form effective mutation optimization...
Conference Paper
Full-text available
The proliferation of Internet of Things (IoT) devices has made people’s lives more convenient, but it has also raised many security concerns. Due to the difficulty of obtaining and emulating IoT firmware, in the absence of internal execution information, black-box fuzzing of IoT devices has become a viable option. However, existing black-box fuzzer...
Preprint
Full-text available
Previous robustness approaches for deep learning models such as data augmentation techniques via data transformation or adversarial training cannot capture real-world variations that preserve the semantics of the input, such as a change in lighting conditions. To bridge this gap, we present NaTra, an adversarial training scheme that is designed to...
Preprint
Full-text available
The diversity and quantity of the data warehousing, gathering data from distributed devices such as mobile phones, can enhance machine learning algorithms' success and robustness. Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding data locally. However, it is also faced with expensive...
Preprint
Full-text available
Collaborative inference has recently emerged as an intriguing framework for applying deep learning to Internet of Things (IoT) applications, which works by splitting a DNN model into two subpart models respectively on resource-constrained IoT devices and the cloud. Even though IoT applications' raw input data is not directly exposed to the cloud in...