Jin Song Dong’s research while affiliated with National University of Singapore and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (345)


Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning
  • Preprint

January 2025

Xianglin Yang

·

Gelei Deng

·

Jieming Shi

·

[...]

·

Jin Song Dong

Large language models (LLMs) are vital for a wide range of applications yet remain susceptible to jailbreak threats, which could lead to the generation of inappropriate responses. Conventional defenses, such as refusal and adversarial training, often fail to cover corner cases or rare domains, leaving LLMs still vulnerable to more sophisticated attacks. We propose a novel defense strategy, Safety Chain-of-Thought (SCoT), which harnesses the enhanced \textit{reasoning capabilities} of LLMs for proactive assessment of harmful inputs, rather than simply blocking them. SCoT augments any refusal training datasets to critically analyze the intent behind each request before generating answers. By employing proactive reasoning, SCoT enhances the generalization of LLMs across varied harmful queries and scenarios not covered in the safety alignment corpus. Additionally, it generates detailed refusals specifying the rules violated. Comparative evaluations show that SCoT significantly surpasses existing defenses, reducing vulnerability to out-of-distribution issues and adversarial manipulations while maintaining strong general capabilities.


Clustering Properties of Self-Supervised Learning
  • Preprint
  • File available

January 2025

Self-supervised learning (SSL) methods via joint embedding architectures have proven remarkably effective at capturing semantically rich representations with strong clustering properties, magically in the absence of label supervision. Despite this, few of them have explored leveraging these untapped properties to improve themselves. In this paper, we provide an evidence through various metrics that the encoder's output encoding exhibits superior and more stable clustering properties compared to other components. Building on this insight, we propose a novel positive-feedback SSL method, termed Representation Soft Assignment (ReSA), which leverages the model's clustering properties to promote learning in a self-guided manner. Extensive experiments on standard SSL benchmarks reveal that models pretrained with ReSA outperform other state-of-the-art SSL methods by a significant margin. Finally, we analyze how ReSA facilitates better clustering properties, demonstrating that it effectively enhances clustering performance at both fine-grained and coarse-grained levels, shaping representations that are inherently more structured and semantically meaningful.

Download

Tennis court and players default locations
MDP demonstration of the “Serve” state. Policies and transition probabilities are not shown
MDP demonstration of the “Rerve” state. Policies and transition probabilities are not shown
MDP demonstration of the “Stroke” state. Policies and transition probabilities are not shown
Analyzing the Formation Strategy in Tennis Doubles Game

SN Computer Science

In the dynamic and strategic environment of tennis doubles games, understanding the multifaceted interactions between players is crucial for enhancing team performance. In our previous work, we introduced a novel analytical framework for tennis doubles, employing Markov Decision Processes (MDP) and probabilistic model checking (PMC) to model the intricate behaviors and interactions in the doubles game Liu et al. (Exploring team strategy dynamics in tennis doubles matches. In: International Sports Analytics Conference and Exhibition, Springer 104–115, 2024). Our previous model only considered the standard standing formation. However, in the professional and NCAA Division 1 doubles matches, the “I” formation is utilized very often but was missing from our previous work. In this paper, we aim to extend our previous model with different formations and discuss the effectiveness of various formation strategies.


Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus

January 2025

·

12 Reads

Proceedings of the ACM on Programming Languages

Recently, the rise of code-centric Large Language Models (LLMs) has reshaped the software engineering world with low-barrier tools like Copilot that can easily generate code. However, there is no correctness guarantee for the code generated by LLMs, which suffer from the hallucination problem, and their output is fraught with risks. Besides, the end-to-end process from specification to code through LLMs is a non-transparent and uncontrolled black box. This opacity makes it difficult for users to understand and trust the generated code. Addressing these challenges is both necessary and critical. In contrast, program refinement transforms high-level specification statements into executable code while preserving correctness. Traditional tools for program refinement are primarily designed for formal methods experts and lack automation and extensibility. We apply program refinement to guide LLM and validate the LLM-generated code while transforming refinement into a more accessible and flexible framework. To initiate this vision, we propose Refine4LLM, an approach that aims to: (1) Formally refine the specifications, (2) Automatically prompt and guide the LLM using refinement calculus, (3) Interact with the LLM to generate the code, (4) Verify that the generated code satisfies the constraints, thus guaranteeing its correctness, (5) Learn and build more advanced refinement laws to extend the refinement calculus. We evaluated Refine4LLM against the state-of-the-art baselines on program refinement and LLMs benchmarks.The experiment results show that Refine4LLM can efficiently generate more robust code and reduce the time for refinement and verification.


Defending LVLMs Against Vision Attacks through Partial-Perception Supervision

December 2024

·

1 Read

Recent studies have raised significant concerns regarding the vulnerability of Large Vision Language Models (LVLMs) to maliciously injected or perturbed input images, which can mislead their responses. Existing defense methods show that such vision attacks are sensitive to image modifications especially cropping, using majority voting across responses of modified images as corrected responses. However, these modifications often result in partial images and distort the semantics, which reduces response quality on clean images after voting. Instead of directly using responses from partial images for voting, we investigate using them to supervise the LVLM's responses to the original images. We propose a black-box, training-free method called DPS (Defense through Partial-Perception Supervision). In this approach, the model is prompted using the responses generated by a model that perceives only a partial image. With DPS, the model can adjust its response based on partial image understanding when under attack, while confidently maintaining its original response for clean input. Our findings show that the weak model can supervise the strong model: when faced with an attacked input, the strong model becomes less confident and adjusts its response based on the weak model's partial understanding, effectively defending against the attack. With clean input, it confidently maintains its original response. Empirical experiments show our method outperforms the baseline, cutting the average attack success rate by 76.3% across six datasets on three popular models.



Fig. 1. Towards Trustworthy LLM Agents with Formal Methods.
The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap

December 2024

·

20 Reads

Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing daily life through their exceptional language understanding and contextual generation capabilities. Despite their remarkable performance, LLMs face a critical challenge: the propensity to produce unreliable outputs due to the inherent limitations of their learning-based nature. Formal methods (FMs), on the other hand, are a well-established computation paradigm that provides mathematically rigorous techniques for modeling, specifying, and verifying the correctness of systems. FMs have been extensively applied in mission-critical software engineering, embedded systems, and cybersecurity. However, the primary challenge impeding the deployment of FMs in real-world settings lies in their steep learning curves, the absence of user-friendly interfaces, and issues with efficiency and adaptability. This position paper outlines a roadmap for advancing the next generation of trustworthy AI systems by leveraging the mutual enhancement of LLMs and FMs. First, we illustrate how FMs, including reasoning and certification techniques, can help LLMs generate more reliable and formally certified outputs. Subsequently, we highlight how the advanced learning capabilities and adaptability of LLMs can significantly enhance the usability, efficiency, and scalability of existing FM tools. Finally, we show that unifying these two computation paradigms -- integrating the flexibility and intelligence of LLMs with the rigorous reasoning abilities of FMs -- has transformative potential for the development of trustworthy AI software systems. We acknowledge that this integration has the potential to enhance both the trustworthiness and efficiency of software engineering practices while fostering the development of intelligent FM tools capable of addressing complex yet real-world challenges.




ConAIR:Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation

November 2024

·

1 Read

Code generation techniques generate code snippets automatically based on the problem requirements in natural language. Recently, large language models (LLMs) achieve the SOTA performance on code generation. However, LLMs still struggle at times to generate accurate code, which diminishes their promised efficiency as developers must spend significant effort evaluating and debugging the generated code. To improve the reliability and quality of the generated codes, researchers propose to leverage Consistency to obtain a better code based on generating and ranking multiple candidates. The existing approach is problematic as Consistency thinks a code is better when (1) the code pass more tests (inter-consistency) (2) more codes share the same behavior (intra-consistency). However, because the tests are also generated by LLMs, they could be wrong as well. As a result, majority voting based on testing results is unreliable. Relying solely on consistency is insufficient to address this issue; integrating user feedback is essential for effectively guiding consistency. We show that with minimal human effort, performance can be significantly enhanced. We propose Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation, ConAIR, which is an approach that aims to improve the performance of a code generator through two distinctive ingredients, i.e., (1) lightweight user effort for validating the correctness of selected tests; and (2) a dynamic strategy for ranking, localizing and correcting multiple tests and codes. Overall, we propose a lightweight interaction framework that incorporates user feedback to correct identified tests and guide the iterative process. The iteration rounds are only 4 in average with the help of consistency. With only lightweight human efforts, we can achieve an improvement of 33% towards the base model.


Citations (48)


... As a result of this, a suggestion for future work is to develop models that can recognise anomalies without relying on these keywords and instead utilise their knowledge. A specific type of logs is looked at by [44] called WebNorm which detects anomalies specifically for web-based applications and if they are found, explains them. This method learns the processes used to produce web applications in their entirety, under normal circumstances, and uses this to define normal logs, if behaviour deviates from this pattern, then they are classed as anomalies. ...

Reference:

LLM-based event log analysis techniques: A survey
Detecting and Explaining Anomalies Caused by Web Tamper Attacks via Building Consistency-based Normality
  • Citing Conference Paper
  • October 2024

... For instance, Terroba et al. [5] developed an MDP-based framework using Monte Carlo tree search to derive optimal policies in singles tennis. Similarly, Liu et al. [6,7] modeled tennis singles as MDPs, utilizing probabilistic model checking to forecast match outcomes and provide strategic insights. Additionally, Wei et al. [8] and Wang et al. [9] evaluated individual actions in singles tennis and badminton, respectively, by calculating the expected probability of winning a rally based on player locations, movement speeds, and ball dynamics. ...

PCSP# Denotational Semantics with an Application in Sports Analytics

... For instance, Terroba et al. [5] developed an MDP-based framework using Monte Carlo tree search to derive optimal policies in singles tennis. Similarly, Liu et al. [6,7] modeled tennis singles as MDPs, utilizing probabilistic model checking to forecast match outcomes and provide strategic insights. Additionally, Wei et al. [8] and Wang et al. [9] evaluated individual actions in singles tennis and badminton, respectively, by calculating the expected probability of winning a rally based on player locations, movement speeds, and ball dynamics. ...

Insight Analysis for Tennis Strategy and Tactics

... Prevailing research in racket sports primarily focuses on singles matches analytics, as evidenced by studies such as Jiang et al. [2], Dong et al. [3], and Liu et al. [4], which often overlook doubles due to their inherent complexity. For instance, Terroba et al. [5] developed an MDP-based framework using Monte Carlo tree search to derive optimal policies in singles tennis. ...

Recognizing a Sequence of Events from Tennis Video Clips: Addressing Timestep Identification and Subtle Class Differences
  • Citing Conference Paper
  • October 2023

... That may not be practical because participants may be reluctant to share any knowledge about the models' output based on their own training data, given that disclosing them is prone to a membership attack [18]. In this paper, we study the adoption of pruning that does not rely on the training data, i.e., data-agnostic pruning [15,22], and therefore can be solely performed by the server without explicitly asking for clients' cooperation. ...

Supervised Robustness-preserving Data-free Neural Network Pruning
  • Citing Conference Paper
  • June 2023

... Prevailing research in racket sports primarily focuses on singles matches analytics, as evidenced by studies such as Jiang et al. [2], Dong et al. [3], and Liu et al. [4], which often overlook doubles due to their inherent complexity. For instance, Terroba et al. [5] developed an MDP-based framework using Monte Carlo tree search to derive optimal policies in singles tennis. ...

Sports Analytics Using Probabilistic Model Checking and Deep Learning

... Recent advancements in phishing detection have predominantly focused on enhancing detection effectiveness. Among these, referencebased phishing detectors (RBPDs) have garnered significant research attention [4][5][6][7][8]. These detectors rely on website content analysis, such as URLs, screenshots, and HTML, to determine phishing activity. ...

Knowledge Expansion and Counterfactual Interaction for Reference-Based Phishing Detection Knowledge Expansion and Counterfactual Interaction for Reference-Based Phishing Detection

... Recent advancements in phishing detection have predominantly focused on enhancing detection effectiveness. Among these, referencebased phishing detectors (RBPDs) have garnered significant research attention [4][5][6][7][8]. These detectors rely on website content analysis, such as URLs, screenshots, and HTML, to determine phishing activity. ...

Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach

... (2) Existing hypergraph-based contrastive learning recommendation models with randomized strategies (e.g., random pruning, random masking, and randomly adding and removing edges) for graph model augmentation and sampling may lead to degraded recommendation performance. This is due to the fact that the aforementioned stochastic strategies can easily change the original topological semantics and introduce false-positive and false-negative examples, resulting in confusing "important" key items and "non-important" items to the user and thus generating false-positive and negative views, which ultimately has a negative impact on model training [20,21]. ...

B 2 -Sampling: Fusing Balanced and Biased Sampling for Graph Contrastive Learning
  • Citing Conference Paper
  • August 2023

... IB has many applications in machine learning tasks, such as model robustness [46], [39], [50], [9], fairness [12], [23], [15], and explainability [49], [2], [11], [38]. In this work, we introduce IB learning to the multimedia recommendation, aiming to reduce the impact of irrelevant features. ...

Empower Post-hoc Graph Explanations with Information Bottleneck: A Pre-training and Fine-tuning Perspective
  • Citing Conference Paper
  • August 2023