October 2024
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Publications (63)
October 2024
·
10 Reads
We propose a novel neural network approach, LARP (Learned Articulated Rigid body Physics), to model the dynamics of articulated human motion with contact. Our goal is to develop a faster and more convenient methodological alternative to traditional physics simulators for use in computer vision tasks such as human motion reconstruction from video. To that end we introduce a training procedure and model components that support the construction of a recurrent neural architecture to accurately simulate articulated rigid body dynamics. Our neural architecture supports features typically found in traditional physics simulators, such as modeling of joint motors, variable dimensions of body parts, contact between body parts and objects, and is an order of magnitude faster than traditional systems when multiple simulations are run in parallel. To demonstrate the value of LARP we use it as a drop-in replacement for a state of the art classical non-differentiable simulator in an existing video-based reconstruction framework and show comparative or better 3D human pose reconstruction accuracy.
June 2023
·
15 Reads
·
14 Citations
December 2022
·
15 Reads
In this paper, we propose a new approach to learned optimization. As common in the literature, we represent the computation of the update step of the optimizer with a neural network. The parameters of the optimizer are then learned on a set of training optimization tasks, in order to perform minimisation efficiently. Our main innovation is to propose a new neural network architecture for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization approaches, our formulation allows for conditioning across different dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for evaluation of optimization algorithms, as well as on the real world-task of physics-based reconstruction of articulated 3D human motion.
June 2022
·
12 Reads
·
47 Citations
June 2022
·
9 Reads
·
42 Citations
May 2022
·
64 Reads
We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video. Applications of physics-based reasoning in human motion analysis have so far been limited, both by the complexity of constructing adequate physical models of articulated human motion, and by the formidable challenges of performing stable and efficient inference with physics in the loop. We jointly address such modeling and inference challenges by proposing an approach that combines a physically plausible body representation with anatomical joint limits, a differentiable physics simulator, and optimization techniques that ensure good performance and robustness to suboptimal local optima. In contrast to several recent methods, our approach readily supports full-body contact including interactions with objects in the scene. Most importantly, our model connects end-to-end with images, thus supporting direct gradient-based physics optimization by means of image-based loss functions. We validate the model by demonstrating that it can accurately reconstruct physically plausible 3d human motion from monocular video, both on public benchmarks with available 3d ground-truth, and on videos from the internet.
May 2022
·
6 Reads
We focus on the task of estimating a physically plausible articulated human motion from monocular video. Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts, while state-of-the-art physics-based approaches have either been shown to work only in controlled laboratory conditions or consider simplified body-ground contact limited to feet. This paper explores how these shortcomings can be addressed by directly incorporating a fully-featured physics engine into the pose estimation process. Given an uncontrolled, real-world scene as input, our approach estimates the ground-plane location and the dimensions of the physical body model. It then recovers the physical motion by performing trajectory optimization. The advantage of our formulation is that it readily generalizes to a variety of scenes that might have diverse ground properties and supports any form of self-contact and contact between the articulated body and scene geometry. We show that our approach achieves competitive results with respect to existing physics-based methods on the Human3.6M benchmark, while being directly applicable without re-training to more complex dynamic motions from the AIST benchmark and to uncontrolled internet videos.
October 2020
·
13 Reads
·
8 Citations
July 2020
·
30 Reads
We propose a new approach to interactive full-image semantic segmentation which enables quickly collecting training data for new datasets with previously unseen semantic classes (A demo is available at https://youtu.be/yUk8D5gEX-o). We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge, but can be done purely based on appearance similarity within an image. We build on this observation and propose an approach capable of jointly propagating pixel labels from multiple classes without having explicit class-specific appearance models. To enable long-range propagation, our approach first globally measures appearance similarity between labeled and unlabeled pixels across the entire image. Then it locally integrates per-pixel measurements which improves the accuracy at boundaries and removes noisy label switches in homogeneous regions. We also design an efficient manual annotation interface that extends the traditional polygon drawing tools with a suite of additional convenient features (and add automatic propagation to it). Experiments with human annotators on the COCO Panoptic Challenge dataset show that the combination of our better manual interface and our novel automatic propagation mechanism leads to reducing annotation time by more than factor of 2x compared to polygon drawing. We also test our method on the ADE-20k and Fashionista datasets without making any dataset-specific adaptation nor retraining our model, demonstrating that it can generalize to new datasets and visual classes.
Citations (53)
... We believe that one of the problems of extending this work to higher-dimensional problems, is that a single fractional order might not accurately describe all the dimensions. In recent works, Transformers were used to overcome the problem of dimensionality in learned optimization [4], but the inference time is slow. We aim to take advantage of other advances in the field, such as faster attention mechanisms [2] or SSM [5] and make this approach more feasible in the future. ...
- Citing Conference Paper
June 2023
... However, these methods often struggle to improve motion plausibility due to oversimplified dynamic equations. Other methods [9,12,51,73] use physical-simulation-based motion imitation as a postprocessing module, learning motion control policy to imitate the reference motion (video motion capture results) in a simulated physical environment. With high-quality reference motions, these methods improve the physical realism of daily motions such as walking, running, and jumping. ...
- Citing Conference Paper
June 2022
... Identifying these issues, researchers have explored physics-based metrics [5,15,27,28,32,39] for 3D HPE. For measuring realistic contact with the ground, metrics such as footskate [27], foot slide [39], and ground penetration [28,39] have been proposed. ...
- Citing Conference Paper
June 2022
... Aggregating Human Inputs Many works, particularly in the crowdsourcing domain, use multiple human inputs to increase accuracy. Though some works (Branson et al. 2010;Russakovsky, Li, and Li 2015) allow the model to choose when to terminate, the most common approach is to allow the human operator to review the model's output directly and provide new information until the result is satisfactory Gouravajhala et al. 2018;Choi et al. 2019;Agustsson, Uijlings, and Ferrari 2019;Uijlings, Andriluka, and Ferrari 2020). These approaches are sufficient for dataset collections: performing tasks such as answering questions about given bounding boxes (Russakovsky, Li, and Li 2015) or confirming answers (Uijlings et al. 2018) is faster and more accurate than generating the dataset through drawing a bounding box directly on the image. ...
- Citing Conference Paper
October 2020
... The encoder is trained to reconstruct 3D poses from corrupted 2D poses using various 2D and 3D human pose datasets, including Human3.6M [10], AMASS [50], PoseTrack [51], and InstaVariety [52]. Subsequently, it is fine-tuned with additional layers for downstream tasks. ...
- Citing Conference Paper
June 2018
... By converting medical images into general image formats, the annotation of medical images can also be realized by general image annotation tools. There are many general image annotation tools available, such as VIA [11], Ratsnage [12], fluid annotation [13], LabelMe [14], iVAT [15], Bayesian-CRF [16], etc. Generally, these general image annotation tools cannot directly annotate commonly used medical image formats (such as DICOM, NIFTI, MHD+RAW, ANALYZE, etc.), and the medical images need to be converted into general image formats first, which decreases the efficiency. ...
- Citing Conference Paper
October 2018
... Multi-object tracking via human pose estimation [35][36][37][38] offers advantages over traditional bounding box-based tracking methods in scenarios with occlusion and similar appearances. Bounding box-based tracking methods often struggle to maintain tracking continuity when the target is partially occluded. ...
- Citing Conference Paper
July 2017
... Classical multi-object tracking algorithms include DeepSORT [33], EAMTT [34], STAM [35], SORT [36], among which DeepSORT demonstrates better tracking performance. The DeepSORT model is an improved version of the SORT tracking model. ...
- Citing Conference Paper
July 2017
... In the field of computer vision, landmarks have been attracting significant attention for a long time, resulting in numerous datasets for Landmark Detection [42], [52]- [55]. Most of these datasets [42], [52], [53] are designed for human pose estimation, which provide annotated images or videos specifically focused on landmarks of the human body. ...
- Citing Article
October 2017
... and it is addressed by the Hungarian method [19]. ...
- Citing Article
June 2015