Benjamin C. M. Fung

Benjamin C. M. Fung
McGill University | McGill · School of Information Studies (SIS)

Ph.D., M.Sc., B.Sc., P.Eng.

About

213
Publications
65,524
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,747
Citations
Introduction
Benjamin Fung is a Canada Research Chair in Data Mining for Cybersecurity, a Full Professor of Information Studies (SIS), and an Associate Member of Computer Science (SOCS) at McGill University. He received a Ph.D. degree in computing science from Simon Fraser University in 2007. Benjamin has over 130 refereed publications that span the research forums of data mining, privacy protection, and cyber forensics.
Additional affiliations
September 2013 - July 2020
McGill University
Position
  • Professor (Associate)
April 2015 - present
McGill University
Position
  • Canada Research Chair in Data Mining for Cybersecurity
June 2011 - August 2013
Concordia University
Position
  • Professor (Associate)
Education
January 2003 - April 2007
Simon Fraser University
Field of study
  • Computing Science
September 2000 - September 2002
Simon Fraser University
Field of study
  • Computing Science
September 1994 - April 1999
Simon Fraser University
Field of study
  • Computing Science

Publications

Publications (213)
Article
Full-text available
With the increasing prevalence of location-aware devices, trajectory data has been generated and collected in various application domains. Trajectory data carries rich information that is useful for many data analysis tasks. Yet, improper publishing and use of trajectory data could jeopardize individual privacy. However, it has been shown that exis...
Article
Full-text available
The collection of digital information by governments, corporations, and individuals has created tremendous opportunities for knowledge- and information-based decision making. Driven by mutual benefits, or by regulations that require certain data to be published, there is a demand for the exchange and publication of data among various parties. Data...
Conference Paper
Full-text available
Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differenti...
Article
Full-text available
Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguis...
Preprint
Training large language models is a computationally intensive process that often requires substantial resources to achieve state-of-the-art results. Incremental layer-wise training has been proposed as a potential strategy to optimize the training process by progressively introducing layers, with the expectation that this approach would lead to fas...
Conference Paper
Software reverse engineering is an essential but time-consuming undertaking in identifying malware, software vulnerabilities, and plagiarism, especially when access to the source code is limited. The extraction of abstract characteristics that represent malware and work as classifier inputs in traditional machine learning approaches requires featur...
Article
Full-text available
Addressing the challenge of toxic language in online discussions is crucial for the development of effective toxicity detection models. This pioneering work focuses on addressing imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. We create a balanced dataset by instructing fine-tuning of Large...
Article
Full-text available
Explainable Artificial Intelligence (XAI) aims to alleviate the black-box AI conundrum in the field of Digital Forensics (DF) (and others) by providing layman-interpretable explanations to predictions made by AI models. It also handles the increasing volumes of forensic images that are impossible to investigate via manual methods; or even automated...
Article
In the past decade, the number of malware variants has increased rapidly. Many researchers have proposed to detect malware using intelligent techniques, such as Machine Learning (ML) and Deep Learning (DL), which have high accuracy and precision. These methods, however, suffer from being opaque in the decision-making process. Therefore, we need Art...
Article
Full-text available
Social media platforms present a perplexing duality, acting at once as sites to build community and a sense of belonging, while also giving rise to misinformation, facilitating and intensifying disinformation campaigns and perpetuating existing patterns of discrimination from the physical world. The first-step platforms take in mitigating the harmf...
Article
Representation learning has been applied to Electronic Health Records (EHR) for medical concept embedding and the downstream predictive analytics tasks with promising results. Medical ontologies can also be integrated to guide the learning so that the embedding space can better align with existing medical knowledge. Yet, properly carrying out the i...
Article
Object detection techniques have been widely studied, utilized in various works, and have exhibited robust performance on images with sufficient luminance. However, these approaches typically struggle to extract valuable features from low-luminance images, which often exhibit blurriness and dim appearence, leading to detection failures. To overcome...
Chapter
Data-driven energy prediction models have drawn extensive attention in building domain in recent years. Improving the predictive accuracy of energy prediction models has been the main concern for existing research. However, an accurate model could not ensure perfect performance under all situations and the performance variation may cause fairness p...
Article
Full-text available
With the growing global awareness of the environmental impact of clothing consumption, there has been a notable surge in the publication of journal articles dedicated to “fashion sustainability” in the past decade, specifically from 2010 to 2020. However, despite this wealth of research, many studies remain disconnected and fragmented due to varyin...
Preprint
Full-text available
The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been e...
Article
Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy man...
Article
Full-text available
With the increasing adoption of digital health platforms through mobile apps and online services, people have greater flexibility connecting with medical practitioners, pharmacists, and laboratories and accessing resources to manage their own health-related concerns. Many healthcare institutions are connecting with each other to facilitate the exch...
Article
Full-text available
The proliferation of ransomware has become a significant threat to cybersecurity in recent years, causing significant financial, reputational, and operational damage to individuals and organizations. This paper aims to provide a comprehensive overview of the evolution of ransomware, its taxonomy, and its state-of-the-art research contributions. We...
Article
Full-text available
Social media use has transformed communication and made social interaction more accessible. Public microblogs allow people to share and access news through existing and social-media-created social connections and access to public news sources. These benefits also create opportunities for the spread of false information. False information online can...
Article
Full-text available
COVID-19 is an opportunity to study public acceptance of a ‘‘new’’ healthcare intervention, universal masking, which unlike vaccination, is mostly alien to the Anglosphere public despite being practiced in ages past. Using a collection of over two million tweets, we studied the ways in which proponents and opponents of masking vied for influence as...
Article
In recent years, the massive data collection in buildings has paved the way for the development of accurate data-driven building models (DDBMs) for various applications. However, a model with a high overall accuracy would not ensure a good predictive performance on all conditions. The biased predictive performance for some conditions may cause fair...
Article
In recent years, massive data collected from buildings made development and application of data-driven building models is a hot research topic. Due to the variation of data volume in different conditions, existing data-driven building models (DDBMs) would present distinct accuracy for different users or periods. This may create further fairness pro...
Article
Recent research indicates that machine learning models are vulnerable to adversarial samples that are slightly perturbed versions of natural samples. Adversarial samples can be crafted in white-box or black-box scenario. In the black-box scenario adversaries possess no knowledge of the detailed architecture and parameters of the model they attack,...
Article
Data-driven models have drawn extensive attention in the building domain in recent years, and their predictive accuracy depends on features or data distribution. Accuracy variation among users or periods creates a certain unfairness to some users. This paper addresses a new research problem called fairness-aware prediction of data-driven building a...
Chapter
Malware has been an increasing threat to computer users. Different pieces of malware have different damage potential depending on their objectives and functionalities. In the literature, there are many studies that focus on automatically identifying malware with their families. However, there is a lack of focus on automatically identifying the seve...
Article
Full-text available
In recent years, the declining birthrate and aging population have gradually brought countries into an ageing society. Regarding accidents that occur amongst the elderly, falls are an essential problem that quickly causes indirect physical loss. In this paper, we propose a pose estimation-based fall detection algorithm to detect fall risks. We use...
Preprint
Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an interpretation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classificat...
Article
Full-text available
A reliable occupancy prediction model plays a critical role in improving the performance of energy simulation and occupant-centric building operations. In general, occupancy and occupant activities differ by season, and it is important to account for the dynamic nature of occupancy in simulations and to propose energy-efficient strategies. The pres...
Article
Indiscriminate elimination of harmful fake news risks destroying satirical news, which can be benign or even beneficial, because both types of news share highly similar textual cues. In this work we applied a recent development in neural network architecture, transformers, to the task of separating satirical news from fake news. Transformers have h...
Article
Full-text available
Complementary metal-oxide-semiconductor (CMOS) image sensors can cause noise in images collected or transmitted in unfavorable environments, especially low-illumination scenarios. Numerous approaches have been developed to solve the problem of image noise removal. However, producing natural and high-quality denoised images remains a crucial challen...
Article
Full-text available
The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including ac...
Chapter
Limited empirical research has examined the importance of product cues and information sources in relation to demographic variables and consumer innovativeness, particularly from a cross-national perspective. In order to understand consumer choice from a cross-national perspective, data were collected from Canada, China, India, and Taiwan. Data wer...
Article
Malware detection and classification are becoming more and more challenging, given the complexity of malware design and the recent advancement of communication and computing infrastructure. The existing malware classification approaches enable reverse engineers to better understand their patterns and categorizations, and to cope with their evolutio...
Article
Malware currently presents a number of serious threats to computer users. Signature-based malware detection methods are limited in detecting new malware samples that are significantly different from known ones. Therefore, machine learning-based methods have been proposed, but there are two challenges these methods face. The first is to model the fu...
Preprint
Full-text available
Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to expla...
Article
Full-text available
The purpose of this study is to investigate the salient effects of product evaluative cues from a cross-national perspective. A web-based survey consisted of eight measuring items of environmental commitment and behaviour, 20 items of product cues, and demographic and behavioural questions were employed. A total of 321 and 309 usable surveys were c...
Article
Full-text available
Haze removal techniques employed to increase the visibility level of an image play an important role in many vision-based systems. Several traditional dark channel prior-based methods have been proposed to remove haze formation and thereby enhance the robustness of these systems. However, when the captured images contain disproportionate haze distr...
Article
We are thrilled and delighted to present this special issue, which emphasises on the novel area of Enabling Technologies for Energy Cloud. This guest editorial provides an overview of all articles accepted for publication in this special issue.
Article
Users from all over the world increasingly adopt social media for newsgathering, especially during breaking news. Breaking news is an unexpected event that is currently developing. Early stages of breaking news are usually associated with lots of unverified information, i.e., rumors. Efficiently detecting and acting upon rumors in a timely fashion...
Chapter
Artificial intelligence (AI) is a well-established branch of computer science concerned with making machines smart enough to perform computationally large or complex tasks that normally require human intelligence; furthermore, it comprises a combination of technologies that can obtain insights and patterns from a massive amount of data which is a c...
Chapter
A problem of authorship characterization is to determine the sociolinguistic characteristics of the potential author of a given anonymous text message. Unlike the problems of authorship attribution, where the potential suspects and their training samples are accessible for investigation, no candidate list of suspects is available in authorship char...
Chapter
In the previous chapters, methods to address two authorship problems, i.e., authorship identification and authorship characterization, were proposed. This chapter discusses the third authorship problem, called authorship verification. The proposed approach is applicable to different types of online messages, but in the current study, the focus is o...
Chapter
Society’s increasing reliance on technology, fueled by a growing desire for increased connectivity (given the increased productivity, efficiency, and availability to name a few motivations) has helped give rise to the compounded growth of electronic data. The increasing adoption of various technologies has driven the need to protect said technologi...
Chapter
This chapter presents the central theme and a big picture of the methods and technologies covered in this book (see Fig. 2.2). For the readers to comprehend presented security and forensics issues, and associated solutions, the content is organized as components of a forensics analysis framework. The framework is employed to analyze online messages...
Chapter
This chapter provides a brief description of the methods employed for collecting initial information about a given suspicious online communication message, including header and network information; and how to forensically analyze the dataset to attain the information that would be necessary to trace back to the source of the crime. The header conte...
Chapter
This chapter presents an overview of authorship analysis from multiple standpoints. It includes historical perspective, description of stylometric features, and authorship analysis techniques and their limitations.
Chapter
This chapter presents a novel approach to frequent-pattern based Writeprint creation, and addresses two authorship problems: authorship attribution in the usual way (disregarding stylistic variation), and authorship attribution by focusing on stylistic variations. Stylistic variation is the occasional change in the writing features of an individual...
Chapter
In the previous chapters, the different aspects of the authorship analysis problem were discussed. This chapter will propose a framework for extracting criminal information from the textual content of suspicious online messages. Archives of online messages, including chat logs, e-mails, web forums, and blogs, often contain an enormous amount of for...
Chapter
This chapter discusses authorship attribution through a training sample. The focus on authorship attribution discussed in this chapter differs in two ways from the traditional authorship identification problem discussed in the earlier chapters of this book. Firstly, the traditional authorship attribution studies [63, 65] only work in the presence o...
Chapter
In this chapter, Associative Classification (AC) [139] is employed, based on association rule discovery techniques, for authorship identification. The developed classification model consists of patterns that represent the respective author’s most prominent combinations of writing style features.
Article
Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health records (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e....
Preprint
Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health records (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e....
Article
Many models have been proposed to preserve data privacy for different data publishing scenarios. Among these models, ∊-differential privacy is receiving increasing attention because it does not make assumptions about adversaries’ prior knowledge and can provide a rigorous privacy guarantee. Although there are numerous proposed approaches using ∊-di...