Yin Yang’s research while affiliated with Hamad bin Khalifa University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (288)


GRIP: A General Robotic Incremental Potential Contact Simulation Dataset for Unified Deformable-Rigid Coupled Grasping
  • Preprint

March 2025

·

8 Reads

Siyu Ma

·

Wenxin Du

·

Chang Yu

·

[...]

·

Grasping is fundamental to robotic manipulation, and recent advances in large-scale grasping datasets have provided essential training data and evaluation benchmarks, accelerating the development of learning-based methods for robust object grasping. However, most existing datasets exclude deformable bodies due to the lack of scalable, robust simulation pipelines, limiting the development of generalizable models for compliant grippers and soft manipulands. To address these challenges, we present GRIP, a General Robotic Incremental Potential contact simulation dataset for universal grasping. GRIP leverages an optimized Incremental Potential Contact (IPC)-based simulator for multi-environment data generation, achieving up to 48x speedup while ensuring efficient, intersection- and inversion-free simulations for compliant grippers and deformable objects. Our fully automated pipeline generates and evaluates diverse grasp interactions across 1,200 objects and 100,000 grasp poses, incorporating both soft and rigid grippers. The GRIP dataset enables applications such as neural grasp generation and stress field prediction.


Calibrating Noise for Group Privacy in Subsampled Mechanisms

February 2025

·

1 Read

·

1 Citation

Proceedings of the VLDB Endowment

Given a group size m and a sensitive dataset D , group privacy (GP) releases information about D (e.g., weights of a neural network trained on D) with the guarantee that the adversary cannot infer with high confidence whether the underlying data is D or a neighboring dataset D ′ that differs from D by m records. GP generalizes the well-established notion of differential privacy (DP) for protecting individuals' privacy; in particular, when m = 1, GP reduces to DP. Compared to DP, GP is capable of protecting the sensitive aggregate information of a group of up to m individuals, e.g., the average annual income among members of a yacht club. Despite its longstanding presence in the research literature and its promising applications, GP is often treated as an afterthought, with most approaches first developing a differential privacy (DP) mechanism and then using a generic conversion to adapt it for GP, treating the DP solution as a black box. As we point out in the paper, this methodology is suboptimal when the underlying DP solution involves subsampling, e.g., in the classic DP-SGD method for training deep learning models. In this case, the DP-to-GP conversion is overly pessimistic in its analysis, leading to high error and low utility in the published results under GP. Motivated by this, we propose a novel analysis framework that provides tight privacy accounting for subsampled GP mechanisms. Instead of converting a black-box DP mechanism to GP, our solution carefully analyzes and utilizes the inherent randomness in subsampled mechanisms, leading to a substantially improved bound on the privacy loss with respect to GP. The proposed solution applies to a wide variety of foundational mechanisms with subsampling. Extensive experiments with real datasets demonstrate that compared to the baseline convert-from-blackbox-DP approach, our GP mechanisms achieve noise reductions of over an order of magnitude in several practical settings, including deep neural network training.


Figure 3: Our simulator can realistically simulate time-dependent wrinkles (a-s, b-s, c-s, d-s) observed in the real clothes (a, b, c, d) in different deformations. In both real and simulated clothes, folding or compressing clothes for a long duration (30 mins) makes the wrinkles sharper. The wrinkles formed on Denim by compressing (d) is sharper than those on cotton clothes (b, d), and our simulator can reproduce these observations (b-s, d-s). Bigger figures in SM.
Figure 4: (a) An originally wrinkleless rectangle cloth falls into a cylindrical container due to its self-weight; (b) The cloth is folded moderately due to collision; (c) To cause extreme deformations, we compress the cloth after it falls on the ground.
Figure 5: The wrinkles on the cloth after lifting it. (a) Immediately lift the cloth after folding moderately; (b) Lift the cloth after keeping the moderate deformation for 500s; (c) Immediately lift the cloth after being compressed by the heavy weight; (d) Lift the cloth after compressing it for 500s.
Figure 6: Trousers simulation. (a) The trousers on A-pose human body does not have wrinkles; (b) Lifting a leg deforms the trousers moderately; (c) Sitting down causes larger deformations.
Figure 7: (a-d) and (e-h) are with and without human body. The wrinkles caused by sitting (c, d, g, h) are more obvious that those caused by lifting leg (a, b, e, f) because sitting causes larger deformations. Moreover, the wrinkles on (b, d, f, h) are sharper and deeper than those on (a, c, e, g). Therefore, keeping deformations for 500s makes the wrinkles more obvious.

+15

Cloth Animation with Time-dependent Persistent Wrinkles
  • Preprint
  • File available

February 2025

·

11 Reads

Persistent wrinkles are often observed on crumpled garments e.g., the wrinkles around the knees after sitting for a while. Such wrinkles can be easily recovered if not deformed for long, and otherwise be persistent. Since they are vital to the visual realism of cloth animation, we aim to simulate realistic looking persistent wrinkles. To this end, we present a physics-inspired fine-grained wrinkle model. Different from existing methods, we recognize the importance of the interplay between internal friction and plasticity during wrinkle formation. Furthermore, we model their time dependence for persistent wrinkles. Our model is capable of not only simulating realistic wrinkle patterns, but also their time-dependent changes according to how long the deformation is maintained. Through extensive experiments, we show that our model is effective in simulating realistic spatial and temporal varying wrinkles, versatile in simulating different materials, and capable of generating more fine-grained wrinkles than the state of the art.

Download

Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics

February 2025

·

24 Reads

Recent advances in large models have significantly advanced image-to-3D reconstruction. However, the generated models are often fused into a single piece, limiting their applicability in downstream tasks. This paper focuses on 3D garment generation, a key area for applications like virtual try-on with dynamic garment animations, which require garments to be separable and simulation-ready. We introduce Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image. Starting with the image, our approach combines a pre-trained image-to-sewing pattern generation model for creating coarse sewing patterns with a pre-trained multi-view diffusion model to produce multi-view images. The sewing pattern is further refined using a differentiable garment simulator based on the generated multi-view images. Versatile experiments demonstrate that our optimization approach substantially enhances the geometric alignment of the reconstructed 3D garments and humans with the input image. Furthermore, by integrating a texture generation module and a human motion generation module, we produce customized physics-plausible and realistic dynamic garment demonstrations. Project page: https://dress-1-to-3.github.io/


Fig. 2. Hessian visualization. We plot the values of local and global Hessian matrices for one training image. The local Hessian (left) is for all the parameters of a kernel í µí²™ í µí±˜ = [í µí²‘ ⊤ í µí±˜ , í µí²’ ⊤ í µí±˜ , í µí²” ⊤ í µí±˜ , í µí²„ ⊤ í µí±˜ , í µí¼Ž í µí±˜ ] ⊤ , while the global Hessian (right) is for all the kernels' parameters. The variation of the matrices across different DOFs suggests (very) weak coupling amount parameters.
3DGS2^2: Near Second-order Converging 3D Gaussian Splatting

January 2025

·

61 Reads

3D Gaussian Splatting (3DGS) has emerged as a mainstream solution for novel view synthesis and 3D reconstruction. By explicitly encoding a 3D scene using a collection of Gaussian kernels, 3DGS achieves high-quality rendering with superior efficiency. As a learning-based approach, 3DGS training has been dealt with the standard stochastic gradient descent (SGD) method, which offers at most linear convergence. Consequently, training often requires tens of minutes, even with GPU acceleration. This paper introduces a (near) second-order convergent training algorithm for 3DGS, leveraging its unique properties. Our approach is inspired by two key observations. First, the attributes of a Gaussian kernel contribute independently to the image-space loss, which endorses isolated and local optimization algorithms. We exploit this by splitting the optimization at the level of individual kernel attributes, analytically constructing small-size Newton systems for each parameter group, and efficiently solving these systems on GPU threads. This achieves Newton-like convergence per training image without relying on the global Hessian. Second, kernels exhibit sparse and structured coupling across input images. This property allows us to effectively utilize spatial information to mitigate overshoot during stochastic training. Our method converges an order faster than standard GPU-based 3DGS training, requiring over 10×10\times fewer iterations while maintaining or surpassing the quality of the compared with the SGD-based 3DGS reconstructions.



Quadratic Programming Consensus Tracking Control of Uncertain Multiagent Systems via Event-Triggered Mechanism

December 2024

·

14 Reads

IEEE Transactions on Systems Man and Cybernetics Systems

This article addresses the consensus tracking control of multiagent systems (MASs) via a quadratic programming (QP) optimization framework, where the control Lyapunov function (CLF) condition serves as a constraint. The optimal controllers, derived through the QP solver, not only ensure the tracking control objective but also minimize the cost functions of agents. To enhance energy efficiency, discontinuous control methods, such as intermittent control strategy and event-triggered mechanism, are employed in the control framework. The CLF-based QP controllers are only updated at specific time instants, in order to reduce the frequency of QP problem-solving. In addition to considering optimization, the proposed methods are extended to uncertain MASs to enhance robustness, where the uncertainty is modeled by Gaussian process regression. In the end, simulation results are provided to demonstrate the feasibility of the theoretical analysis.


Figure 3. Applying blending in latent space causes temporal inconsistency at cake's edge.
PhysMotion: Physics-Grounded Dynamics From a Single Image

November 2024

·

61 Reads

We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions (e.g., applied force and torque), producing high-quality, physically plausible video generation. By utilizing continuum mechanics-based simulations as a prior knowledge, our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions. Our framework begins by reconstructing a feed-forward 3D Gaussian from a single image through geometry optimization. This representation is then time-stepped using a differentiable Material Point Method (MPM) with continuum mechanics-based elastoplasticity models, which provides a strong foundation for realistic dynamics, albeit at a coarse level of detail. To enhance the geometry, appearance and ensure spatiotemporal consistency, we refine the initial simulation using a text-to-image (T2I) diffusion model with cross-frame attention, resulting in a physically plausible video that retains intricate details comparable to the input image. We conduct comprehensive qualitative and quantitative evaluations to validate the efficacy of our method. Our project page is available at: \url{https://supertan0204.github.io/physmotion_website/}.



Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

November 2024

·

21 Reads

ACM Transactions on Graphics

We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint sets that vary in each iteration present additional challenges in algorithm convergence. Our method employs a novel barrier-augmented Lagrangian method to improve system conditioning and solver efficiency by adaptively updating an augmentation constraint sets. This enables the utilization of a scalable, inexact Newton-PCG solver with sparse GPU storage, eliminating the need for direct factorization. We further enhance PCG convergence speed with a domain-decomposed warm start strategy based on an eigenvalue spectrum approximated through our in-time assembly. Demonstrating significant scalability improvements, our method makes simulations previously impractical on 128 GB of CPU memory feasible with only 8 GB of GPU memory and orders-of-magnitude faster. Additionally, our method adeptly handles stiff problems, surpassing the capabilities of existing GPU-based interior-point methods. Our results, validated across various complex collision scenarios involving intricate geometries and large deformations, highlight the exceptional performance of our approach.


Citations (54)


... Differential Privacy. Differential privacy [15,17] has been widely adopted in collaborative learning frameworks due to its ability to provide strong privacy guarantees without significantly compromising model inference efficiency, unlike encryption-based methods [22]. A recent study [27] introduces a mechanism to protect client embeddings through local differential privacy. ...

Reference:

Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
Calibrating Noise for Group Privacy in Subsampled Mechanisms
  • Citing Article
  • February 2025

Proceedings of the VLDB Endowment

... It is particularly advantageous when the objective function lacks a closed-form expression but can be observed at sampled input values. Its effectiveness and versatility make it particularly well-suited for machine learning and deep learning tasks, including tuning hyperparameters of deep neural networks [43]. Besides, employing Bayesian optimization increases the performance of traditional deep learning models [43][44][45]. ...

Optimized Long Short-Term Memory Network for LiDAR-Based Vehicle Trajectory Prediction Through Bayesian Optimization
  • Citing Article
  • January 2024

IEEE Transactions on Intelligent Transportation Systems

... Traditional methods often rely on visual features or markers, which can be computationally intensive and prone to errors, especially in occluded or crowded scenes. Motion vectors offer a robust alternative by leveraging inherent motion information in video sequences, allowing for more accurate real-time player tracking (Majeed et al., 2024). Instance segmentation, a critical task in computer vision, involves identifying and delineating each object instance within an image or video frame . ...

MV-Soccer: Motion-Vector Augmented Instance Segmentation for Soccer Player Tracking
  • Citing Conference Paper
  • June 2024

... Highly complex DL models can deliver accurate forecasts but often act like "black boxes", making it hard to understand why certain predictions are made. Urban planners and policymakers might be hesitant to rely on models they cannot fully interpret, especially when those predictions guide expensive infrastructure investments or are related to accidents [147]. Balancing accuracy with explainability is not easy. ...

Interpretable Traffic Accident Prediction: Attention Spatial–Temporal Multi-Graph Traffic Stream Learning Approach
  • Citing Article
  • November 2024

IEEE Transactions on Intelligent Transportation Systems

... However, current physics-based simulation methods that use NeRF [17,50] or Gaussian splatting [6,38,89] either focus on synthetic objects, allowing for full-view observations during reconstruction, or simulate elastic deformations and jittering, in which objects remain constrained to the contacted surface. This prevents objects from truly detaching under user-specified impulses. ...

PIE-NeRF: Physics-Based Interactive Elastodynamics with NeRF
  • Citing Conference Paper
  • June 2024

... This approach produces highquality results and significantly enhances real-time rendering performance. Thanks to these advantages, 3DGS has been widely adopted across various research areas, including avatars [1,27,29,33,47], dynamic scenes [12,19,20,43,51], and 3D generation [5,21,36,45,46]. Recently, Scaffold-GS [22] has advanced the 3DGS framework by introducing anchor points to construct a hierarchical 3D representation. ...

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
  • Citing Conference Paper
  • June 2024

... Marwa Qaraqe et. al. [36] introduced a deep learning model based on the swin transformer, which classifies crowd behavior into four categories: Natural (N), Large Peaceful Gathering (LPG), Large Violent Gathering (LVG), and Fighting. The proposed model integrates crowd-counting maps and optical flow maps to improve the detection of crowd dynamics and violence levels. ...

Crowd behavior detection: leveraging video swin transformer for crowd size and violence level analysis

Applied Intelligence

... In formal terms, classification by LMs differs from open-ended generation in that it involves structured generation constrained by a predefined set of options. For instance, in stock trading, classifications are typically limited to two discrete actions: long or short (Chuang & Yang, 2022;Koa et al., 2024;Bao et al., 2024). In such cases, LMs are required to provide a single, definitive response rather than ambiguous recommendations, such as suggesting that both options could be reasonable under certain conditions. ...

Data-driven stock forecasting models based on neural networks : A review
  • Citing Article
  • August 2024

Information Fusion

... Second, we intend to conduct optimization work to further improve the bottleneck of the parallel algorithm. Third, we plan to employ simulation solvers such as the recently introduced Vertex Block Descent [33] and incorporate methods to enhance performance through appropriate mixing. Lastly, we intend to make our framework open source, allowing for easy integration into the Unity engine for the development of XR applications using this simulation system. ...

Vertex Block Descent
  • Citing Article
  • July 2024

ACM Transactions on Graphics

... Recent advancements in Simultaneous Localization and Mapping (SLAM) have significantly leveraged deep learning to enhance the robustness and accuracy of 3D reconstruction and camera tracking. The use of neural implicit representations, as demonstrated by NICE-SLAM [1], allows for detailed reconstructions of large-scale indoor scenes through a hierarchical grid-based encoding that efficiently manages local updates and optimizes scene representations within viewing frustums [1][2][3][4][5][6][7][8]. Additionally, the comprehensive evaluation in [9] shows that the proposed method outperforms other state-ofthe-art approaches, providing a novel solution for automated GI tract segmentation through the integration of specialized architectures. ...

X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD
  • Citing Article
  • July 2024

ACM Transactions on Graphics