Sihong Xie’s research while affiliated with Lehigh University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (41)


Figure 1: (a) Joint distribution shift can include both covariate shift (PX ̸ = QX ) and concept shift (fP ̸ = fQ). Coverage gap (Eq. (3)) is the absolute difference in cumulative probabilities of calibration and test conformal scores at the empirical 1 − α quantile τ . We first address covariate-shift-induced Wasserstein distance by applying importance weighting (Tibshirani et al., 2019) to calibration samples, and further minimize conceptshift-induced Wasserstein distance to obtain accurate and efficient prediction sets; (b) Q
Figure 2: Pushforward measures.
Figure 3: Comparison of vanilla CP, IW-CP, and WR-CP based on normalized Wasserstein distance between calibration and test conformal scores: IW-CP can only address the distance caused by covariate shift, while WR-CP reduces the distance from concept shift. The β values for the WR-CP method are 9, 11, 9, 10, 13, and 13, respectively.
Figure 4: Coverages and set sizes of WR-CP and baselines with α = 0.2: WR-CP makes coverages on test data more concentrated around the 1 − α level compared to vanilla CP, IW-CP, and CQR. While WC-CP ensures coverage guarantees, it leads to inefficient predictions due to large set sizes, whereas IW-CP mitigates this inefficiency. The β values for the WR-CP method are 4.5, 9, 9, 6, 8, and 20, respectively.
Figure 6: Coverages and set sizes of WC-CP and Hybrid WC-WR with coverage guarantee 1 − α = 0.9.

+1

Wasserstein-regularized Conformal Prediction under General Distribution Shift
  • Preprint
  • File available

January 2025

·

11 Reads

Rui Xu

·

·

Yue Sun

·

[...]

·

Sihong Xie

Conformal prediction yields a prediction set with guaranteed 1α1-\alpha coverage of the true target under the i.i.d. assumption, which may not hold and lead to a gap between 1α1-\alpha and the actual coverage. Prior studies bound the gap using total variation distance, which cannot identify the gap changes under distribution shift at a given α\alpha. Besides, existing methods are mostly limited to covariate shift,while general joint distribution shifts are more common in practice but less researched.In response, we first propose a Wasserstein distance-based upper bound of the coverage gap and analyze the bound using probability measure pushforwards between the shifted joint data and conformal score distributions, enabling a separation of the effect of covariate and concept shifts over the coverage gap. We exploit the separation to design an algorithm based on importance weighting and regularized representation learning (WR-CP) to reduce the Wasserstein bound with a finite-sample error bound.WR-CP achieves a controllable balance between conformal prediction accuracy and efficiency. Experiments on six datasets prove that WR-CP can reduce coverage gaps to 3.1%3.1\% across different confidence levels and outputs prediction sets 38%\% smaller than the worst-case approach on average.

Download

On the generalization discrepancy of spatiotemporal dynamics-informed graph convolutional networks

July 2024

·

17 Reads

Frontiers in Mechanical Engineering

Graph neural networks (GNNs) have gained significant attention in diverse domains, ranging from urban planning to pandemic management. Ensuring both accuracy and robustness in GNNs remains a challenge due to insufficient quality data that contains sufficient features. With sufficient training data where all spatiotemporal patterns are well-represented, existing GNN models can make reasonably accurate predictions. However, existing methods fail when the training data are drawn from different circumstances (e.g., traffic patterns on regular days) than test data (e.g., traffic patterns after a natural disaster). Such challenges are usually classified under domain generalization. In this work, we show that one way to address this challenge in the context of spatiotemporal prediction is by incorporating domain differential equations into graph convolutional networks (GCNs). We theoretically derive conditions where GCNs incorporating such domain differential equations are robust to mismatched training and testing data compared to baseline domain agnostic models. To support our theory, we propose two domain-differential-equation-informed networks: Reaction-Diffusion Graph Convolutional Network (RDGCN), which incorporates differential equations for traffic speed evolution, and the Susceptible-Infectious-Recovered Graph Convolutional Network (SIRGCN), which incorporates a disease propagation model. Both RDGCN and SIRGCN are based on reliable and interpretable domain differential equations that allow the models to generalize to unseen patterns. We experimentally show that RDGCN and SIRGCN are more robust with mismatched testing data than state-of-the-art deep learning methods.


Figure 1: Blue ( 1 ⃝): Model training. Yellow ( 2 ⃝-4 ⃝): Explanation generation for a target input. Red ( 5 ⃝-6 ⃝): Adversarial attacks against the explanation by manipulating the input.
Figure 2: A smaller ℓp distance between saliency maps does not imply similar top salient features. x is the original input, and x ′ and x ′′ are two perturbed inputs. Saliency map (explanation), denoted by I(·), is a function of the input. The saturation of the red color indicates feature saliency, and the blue dashed boxes highlight the top salient features/regions. Left: ∥I(x) − I(x ′ )∥2 = 2.4 > 1.1 = ∥I(x) − I(x ′′ )∥2. However, I(x ′ ) and I(x) have the same top-3 salient features. Right: ∥I(x) − I(x ′ )∥2 = 0.10 > 0.07 = ∥I(x) − I(x ′′ )∥2, but the top-50 salient features from I(x) and I(x ′ ) have a 92% of overlap, and only 36% between I(x) and I(x ′′ ).
Robust Ranking Explanations

July 2023

·

10 Reads

Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using p\ell_p-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.



Robust Ranking Explanations

December 2022

·

7 Reads

Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on p\ell_p-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.



Figure 1: Left: graph-based detector (GNN) infers the suspiciousness for reviews in a graph; right: computational graphs for the GNN on spam reviews coming from different groups.
Figure 2: An example of data augmentation for minority subgroup. We duplicate the minority subgroup "mixed user" V í µí±ˆ 0,1 and their associated reviews, then randomly prune edges linked to the non-spam reviews.
Subgroup Fairness in Graph-based Spam Detection

April 2022

·

35 Reads

Fake reviews are prevalent on review websites such as Amazon and Yelp. GNN is the state-of-the-art method that can detect suspicious reviewers by exploiting the topologies of the graph connecting reviewers, reviews, and target products. However, the discrepancy in the detection accuracy over different groups of reviewers causes discriminative treatment of different reviewers of the websites, leading to less engagement and trustworthiness of such websites. The complex dependencies over the review graph introduce difficulties in teasing out subgroups of reviewers that are hidden within larger groups and are treated unfairly. There is no previous study that defines and discovers the subtle subgroups to improve equitable treatment of reviewers. This paper addresses the challenges of defining, discovering, and utilizing subgroup memberships for fair spam detection. We first define a subgroup membership that can lead to discrepant accuracy in the subgroups. Since the subgroup membership is usually not observable while also important to guide the GNN detector to balance the treatment, we design a model that jointly infers the hidden subgroup memberships and exploits the membership for calibrating the target GNN's detection accuracy across subgroups. Comprehensive results on two large Yelp review datasets demonstrate that the proposed model can be trained to treat the subgroups more fairly.


Interpretable and Effective Reinforcement Learning for Attacking against Graph-based Rumor Detection

January 2022

·

28 Reads

Social networks are polluted by rumors, which can be detected by machine learning models. However, the models are fragile and understanding the vulnerabilities is critical to rumor detection. Certain vulnerabilities are due to dependencies on the graphs and suspiciousness ranking and are difficult for end-to-end methods to learn from limited noisy data. With a black-box detector, we design features capturing the dependencies to allow a reinforcement learning to learn an effective and interpretable attack policy based on the detector output. To speed up learning, we devise: (i) a credit assignment method that decomposes delayed rewards to individual attacking steps proportional to their effects; (ii) a time-dependent control variate to reduce variance due to large graphs and many attacking steps. On two social rumor datasets, we demonstrate: (i) the effectiveness of the attacks compared to rule-based attacks and end-to-end approaches; (ii) the usefulness of the proposed credit assignment strategy and control variate; (iii) interpretability of the policy when generating strong attacks.



Self-learn to Explain Siamese Networks Robustly

September 2021

·

133 Reads

Learning to compare two objects are essential in applications, such as digital forensics, face recognition, and brain network analysis, especially when labeled data is scarce and imbalanced. As these applications make high-stake decisions and involve societal values like fairness and transparency, it is critical to explain the learned models. We aim to study post-hoc explanations of Siamese networks (SN) widely used in learning to compare. We characterize the instability of gradient-based explanations due to the additional compared object in SN, in contrast to architectures with a single input instance. We propose an optimization framework that derives global invariance from unlabeled data using self-learning to promote the stability of local explanations tailored for specific query-reference pairs. The optimization problems can be solved using gradient descent-ascent (GDA) for constrained optimization, or SGD for KL-divergence regularized unconstrained optimization, with convergence proofs, especially when the objective functions are nonconvex due to the Siamese architecture. Quantitative results and case studies on tabular and graph data from neuroscience and chemical engineering show that the framework respects the self-learned invariance while robustly optimizing the faithfulness and simplicity of the explanation. We further demonstrate the convergence of GDA experimentally.


Citations (23)


... Despite the excellent performance of propagation-based methods on rumor detection tasks, malicious users and miscreants may exploit the vulnerability of GNNs (Zügner, Akbarnejad, and Günnemann 2018) to evade or interfere with rumor detection results, raising concerns about security issues in rumor detectors. While recent studies have focused on adversarial attacks, such as AdRumor-RL (Lyu et al. 2023) and MARL , which employ reinforcement learning frameworks to explore the vulnerability of rumor detectors under structural adversarial attacks, a different type of attack, i.e., the backdoor attack (Zhang et al. 2021), has been largely overlooked. In a backdoor attack, the target model is compromised by injecting carefully crafted triggers and modifying labels in a small subset of the training data. ...

Reference:

Backdoor Attack on Propagation-based Rumor Detectors
Interpretable and Effective Reinforcement Learning for Attacking against Graph-based Rumor Detection
  • Citing Conference Paper
  • June 2023

... All WSSL methods can be divided into probabilistic graphical model approach, deep learning model approach, and neuralized graphical model approach. (1) In probabilistic graphical model approach (and in addition to the HMM-based models [20,21,26,36,39]), Rodrigues et al. [32] in early 2014 used a partially directed graph containing a CRF for modeling to solve the truth inference from crowdsourcing labels; (2) In deep learning model approach (and in addition to the "source-specific perturbation" methods [17,26,46]), other methods [17,[33][34][35] are either based on the end-to-end deep neural architecture [33], or the customized optimization objective along with coordinate ascent optimization technology [34,35], or the iterative solving framework similar to expectation-maximization algorithm [4]. However, all these methods do not have the advantages of the recently proposed neuralized HMM-based graphical models [18,19] and our Neural-Hidden-CRF in principled modeling for variants of interest and in harnessing the context information that provided by advanced deep learning models. ...

Truth Discovery in Sequence Labels from Crowds

... (Deng et al., 2021a) incorporated gradient regularization into the loss function, achieving synaptic connection pruning and weight quantization based on the Alternating Direction Method of Multipliers (ADMM). Similarly, Yin et al. combined sparse spike encoding with sparse network connections, using sparse regularization to establish models for spike data transmission and network sparsification (Yin et al., 2021). There are also some studies to explore the connection pruning for spiking-based Transformer structure . ...

Energy-Efficient Models for High-Dimensional Spike Train Classification using Sparse Spiking Neural Networks
  • Citing Conference Paper
  • August 2021

... RL has also been applied to competitive scenarios. Nash-Detect [65] models fraud detection as a minimax game between review spammers and spam detectors, and HP-KGAT [66] optimizes path-aware graph attention for fake news detection with subgraph reasoning. RL is appealing in unsupervised settings as the label scarcity is not the bottleneck of modeling. ...

Robust Spammer Detection by Nash Reinforcement Learning
  • Citing Conference Paper
  • August 2020

... However, to avoid being detected by fraud detection systems, fraudsters will try their best to camouflage themselves. Common methods include mimicking the behavioral patterns of benign users and establishing connections with benign users (Hooi et al. 2016;Ge et al. 2018;Liu et al. 2020). The first type of camouflage behavior is called feature camouflage and the second type is called relation camouflage. ...

Securing Behavior-based Opinion Spam Detection
  • Citing Conference Paper
  • December 2018

... Generative AI models, such as Generative Adversarial Networks (GANs) [13], are used to generate images, texts, and even videos, offering significant possibilities in various sectors, namely arts, advertising, design, educational applications, where creativity and rapid content production are essential. Authors in [14,15] demonstrate the crucial role of generative AI models in anomaly detection and fraud prevention, as well as in other fields such as medicine [16] and retail [17]. In the education sector, these models are employed to create personalized educational content, interactive quizzes, and course materials tailored to the individual needs of students. ...

Securing Behavior-based Opinion Spam Detection

... A modified version of the EMMA algorithm [1] called TKE was developed to perform this task [18]. Two other notable extensions of FEM are weighted episode mining [22] and fuzzy episode mining [25], which consider event sequences with varying weights. The former allows for the importance of different event types to be weighted, whereas the latter handles events with quantities using fuzzy sets to deal with imprecise events. ...

Mining Weighted Frequent Closed Episodes over Multiple Sequences

Tehnicki vjesnik - Technical Gazette

... The SVO triples can be classified based on the contents of their subjects (e.g., personal pronouns, product names, brands, or product features) to reflect different types of CNs. Subsequently, several ML-based CNs summarization models are proposed for different design purposes, including design innovation, product planning, quality improvement, sustainable design, etc [101,116,119,25,34,39,5,85]. Quantitative analysis refers to the computational study of a customer's attitude towards product features, which can be achieved through sentiment analysis or product rating analysis. ...

Product function need recognition via semi-supervised attention network
  • Citing Conference Paper
  • December 2017