Conference Paper

Kinematic Motion Analysis with Volumetric Motion Capture

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Background Markerless motion capture has the potential to perform movement analysis with reduced data collection and processing time compared to marker-based methods. This technology is now starting to be applied for clinical and rehabilitation applications and therefore it is crucial that users of these systems understand both their potential and limitations. This literature review aims to provide a comprehensive overview of the current state of markerless motion capture for both single camera and multi-camera systems. Additionally, this review explores how practical applications of markerless technology are being used in clinical and rehabilitation settings, and examines the future challenges and directions markerless research must explore to facilitate full integration of this technology within clinical biomechanics. Methodology A scoping review is needed to examine this emerging broad body of literature and determine where gaps in knowledge exist, this is key to developing motion capture methods that are cost effective and practically relevant to clinicians, coaches and researchers around the world. Literature searches were performed to examine studies that report accuracy of markerless motion capture methods, explore current practical applications of markerless motion capture methods in clinical biomechanics and identify gaps in our knowledge that are relevant to future developments in this area. Results Markerless methods increase motion capture data versatility, enabling datasets to be re-analyzed using updated pose estimation algorithms and may even provide clinicians with the capability to collect data while patients are wearing normal clothing. While markerless temporospatial measures generally appear to be equivalent to marker-based motion capture, joint center locations and joint angles are not yet sufficiently accurate for clinical applications. Pose estimation algorithms are approaching similar error rates of marker-based motion capture, however, without comparison to a gold standard, such as bi-planar videoradiography, the true accuracy of markerless systems remains unknown. Conclusions Current open-source pose estimation algorithms were never designed for biomechanical applications, therefore, datasets on which they have been trained are inconsistently and inaccurately labelled. Improvements to labelling of open-source training data, as well as assessment of markerless accuracy against gold standard methods will be vital next steps in the development of this technology.
Article
Full-text available
Human movement researchers are often restricted to laboratory environments and data capture techniques that are time and/or resource intensive. Markerless pose estimation algorithms show great potential to facilitate large scale movement studies ‘in the wild’, i.e., outside of the constraints imposed by marker-based motion capture. However, the accuracy of such algorithms has not yet been fully evaluated. We computed 3D joint centre locations using several pre-trained deep-learning based pose estimation methods (OpenPose, AlphaPose, DeepLabCut) and compared to marker-based motion capture. Participants performed walking, running and jumping activities while marker-based motion capture data and multi-camera high speed images (200 Hz) were captured. The pose estimation algorithms were applied to 2D image data and 3D joint centre locations were reconstructed. Pose estimation derived joint centres demonstrated systematic differences at the hip and knee (~ 30–50 mm), most likely due to mislabeling of ground truth data in the training datasets. Where systematic differences were lower, e.g., the ankle, differences of 1–15 mm were observed depending on the activity. Markerless motion capture represents a highly promising emerging technology that could free movement scientists from laboratory environments but 3D joint centre locations are not yet consistently comparable to marker-based motion capture.
Article
Full-text available
Biomechanical feedback technologies are becoming increasingly prevalent in elite athletic training environments but how the kinematic and kinetic data they produce can be best used to improve sports techniques and enhance sports performance is unclear. This paper draws on theoretical and empirical developments in the motor control, skill acquisition, and sports biomechanics literatures to offer practical guidance and strategic direction on this issue. It is argued that the information produced by biomechanical feedback technologies can only describe, with varying degrees of accuracy, what patterns of coordination and control are being adopted by the athlete but, crucially, it cannot prescribe how these patterns of coordination and control should be modified to enhance sports performance. As conventional statistical and theoretical modelling paradigms in applied sports biomechanics provide limited information about patterns of coordination and control, and do not permit the identification of athlete-specific optimum sports techniques, objective criteria on which to base technical modifications that will consistently lead to enhanced performance outcomes cannot reliably be established for individual athletes. Given these limitations, an alternative approach, which is harmonious with the tenets of dynamical systems theory and aligned with the pioneering insights of Bernstein (1967) on skill acquisition, is advocated. This approach involves using kinematic and kinetic data to channel the athlete’s search towards their own unique ‘optimum’ pattern of coordination and control as they actively explore their perceptual-motor workspace during practice. This approach appears to be the most efficacious use of kinematic and kinetic data given current biomechanical knowledge about sports techniques and the apparent inability of existing biomechanical modelling approaches to accurately predict how technique changes will impact on performance outcomes for individual athletes.
Article
Full-text available
Background: Several factors, including the aging population and the recent corona pandemic, have increased the need for cost effective, easy-to-use and reliable telerehabilitation services. Computer vision-based marker-less human pose estimation is a promising variant of telerehabilitation and is currently an intensive research topic. It has attracted significant interest for detailed motion analysis, as it does not need arrangement of external fiducials while capturing motion data from images. This is promising for rehabilitation applications, as they enable analysis and supervision of clients' exercises and reduce clients' need for visiting physiotherapists in person. However, development of a marker-less motion analysis system with precise accuracy for joint identification, joint angle measurements and advanced motion analysis is an open challenge. Objectives: The main objective of this paper is to provide a critical overview of recent computer vision-based marker-less human pose estimation systems and their applicability for rehabilitation application. An overview of some existing marker-less rehabilitation applications is also provided. Methods: This paper presents a critical review of recent computer vision-based marker-less human pose estimation systems with focus on their provided joint localization accuracy in comparison to physiotherapy requirements and ease of use. The accuracy, in terms of the capability to measure the knee angle, is analysed using simulation. Results: Current pose estimation systems use 2D, 3D, multiple and single view-based techniques. The most promising techniques from a physiotherapy point of view are 3D marker-less pose estimation based on a single view as these can perform advanced motion analysis of the human body while only requiring a single camera and a computing device. Preliminary simulations reveal that some proposed systems already provide a sufficient accuracy for 2D joint angle estimations. Conclusions: Even though test results of different applications for some proposed techniques are promising, more rigour testing is required for validating their accuracy before they can be widely adopted in advanced rehabilitation applications.
Article
Full-text available
This paper reviews the recent literature on technologies and methodologies for quantitative human gait analysis in the context of neurodegnerative diseases. The use of technological instruments can be of great support in both clinical diagnosis and severity assessment of these pathologies. In this paper, sensors, features and processing methodologies have been reviewed in order to provide a highly consistent work that explores the issues related to gait analysis. First, the phases of the human gait cycle are briefly explained, along with some non-normal gait patterns (gait abnormalities) typical of some neurodegenerative diseases. The work continues with a survey on the publicly available datasets principally used for comparing results. Then the paper reports the most common processing techniques for both feature selection and extraction and for classification and clustering. Finally, a conclusive discussion on current open problems and future directions is outlined.
Article
Full-text available
Kinematic analysis is often performed in a lab using optical cameras combined with reflective markers. With the advent of artificial intelligence techniques such as deep neural networks, it is now possible to perform such analyses without markers, making outdoor applications feasible. In this paper I summarise 2D markerless approaches for estimating joint angles, highlighting their strengths and limitations. In computer science, so-called “pose estimation” algorithms have existed for many years. These methods involve training a neural network to detect features (e.g. anatomical landmarks) using a process called supervised learning, which requires “training” images to be manually annotated. Manual labelling has several limitations, including labeller subjectivity, the requirement for anatomical knowledge, and issues related to training data quality and quantity. Neural networks typically require thousands of training examples before they can make accurate predictions, so training datasets are usually labelled by multiple people, each of whom has their own biases, which ultimately affects neural network performance. A recent approach, called transfer learning, involves modifying a model trained to perform a certain task so that it retains some learned features and is then re-trained to perform a new task. This can drastically reduce the required number of training images. Although development is ongoing, existing markerless systems may already be accurate enough for some applications, e.g. coaching or rehabilitation. Accuracy may be further improved by leveraging novel approaches and incorporating realistic physiological constraints, ultimately resulting in low-cost markerless systems that could be deployed both in and outside of the lab.
Article
Full-text available
Human activity recognition (HAR) systems attempt to automatically identify and analyze human activities using acquired information from various types of sensors. Although several extensive review papers have already been published in the general HAR topics, the growing technologies in the field as well as the multi-disciplinary nature of HAR prompt the need for constant updates in the field. In this respect, this paper attempts to review and summarize the progress of HAR systems from the computer vision perspective. Indeed, most computer vision applications such as human computer interaction, virtual reality, security, video surveillance and home monitoring are highly correlated to HAR tasks. This establishes new trend and milestone in the development cycle of HAR systems. Therefore, the current survey aims to provide the reader with an up to date analysis of vision-based HAR related literature and recent progress in the field. At the same time, it will highlight the main challenges and future directions.
Article
Full-text available
There is a need within human movement sciences for a markerless motion capture system, which is easy to use and sufficiently accurate to evaluate motor performance. This study aims to develop a 3D markerless motion capture technique, using OpenPose with multiple synchronized video cameras, and examine its accuracy in comparison with optical marker-based motion capture. Participants performed three motor tasks (walking, countermovement jumping, and ball throwing), and these movements measured using both marker-based optical motion capture and OpenPose-based markerless motion capture. The differences in corresponding joint positions, estimated from the two different methods throughout the analysis, were presented as a mean absolute error (MAE). The results demonstrated that, qualitatively, 3D pose estimation using markerless motion capture could correctly reproduce the movements of participants. Quantitatively, of all the mean absolute errors calculated, approximately 47% were <20 mm, and 80% were <30 mm. However, 10% were >40 mm. The primary reason for mean absolute errors exceeding 40 mm was that OpenPose failed to track the participant's pose in 2D images owing to failures, such as recognition of an object as a human body segment or replacing one segment with another depending on the image of each frame. In conclusion, this study demonstrates that, if an algorithm that corrects all apparently wrong tracking can be incorporated into the system, OpenPose-based markerless motion capture can be used for human movement science with an accuracy of 30 mm or less.
Article
Full-text available
The measurement of human motion relies heavily on the ability of the optoelectronic motion capture systems to accurately measure retroreflective marker positions. Marker position accuracy is largely dependent on the optical characteristics of the camera system and algorithms implemented in the tracking software. In 1999, Richards critically reviewed multiple camera systems and each of their ability to generate the same marker coordinate positions under different field tests. Field tests were designed utilizing what became to be known as the Standard Assessment of Motion System Accuracy (SAMSA) device. The SAMSA test results indicated that some systems outperformed others and accuracy varied depending on the test. In the subsequent 20 years, motion capture technology has significantly improved. Therefore, we aimed to replicate the tests from 1999 using current, higher-resolution motion capture systems and current calibration techniques. Reference marker distances were established utilizing a FaroArm 3D digitizer (FARO technologies). The modern cameras field test performance showed nearly a three-fold improvement in agreement with the reference measure compared to their 1999 predecessors. No systems’ absolute marker distance measured greater than 0.5mm from the FaroArm measurement and no systems’ maximum error exceeded 1.0mm. Given the consistency in accuracy reported by all the systems included in the current assessment, it’s reasonable to assume that the current systems would eliminate errors associated with system hardware/software as an obstacle to sharing data between laboratories.
Book
Full-text available
This monograph addresses the problem of geometric registration and the classical Iterative Closest Point (ICP) algorithm. Though commonly used in several research fields, the focus here is on mobile robotics in which point clouds need to be registered. Even with this narrowed focus, the problem is still complex and multiple-faceted. A lot has been published on the topic, and, up until now, it has lacked a general comparison methodology. This is understandable given the diversity of applications, characteristics of the sensors, motion capabilities of the robotic platform and characteristics of the environments, but it hinders the selection process of an appropriate instance of the algorithm. This review tackles this challenge by three means: (1) a wide literature review with a historical perspective, (2) the elaboration of a theoretical descriptive framework of geometric registration grounded in the literature review, and (3) the analysis of practical use cases covering a wide diversity of mobile robotics applications. The end result provides the reader with a set of guidelines for the choice of geometric registration configuration.
Article
Full-text available
Background: The study of human movement within sports biomechanics and rehabilitation settings has made considerable progress over recent decades. However, developing a motion analysis system that collects accurate kinematic data in a timely, unobtrusive and externally valid manner remains an open challenge. Main body: This narrative review considers the evolution of methods for extracting kinematic information from images, observing how technology has progressed from laborious manual approaches to optoelectronic marker-based systems. The motion analysis systems which are currently most widely used in sports biomechanics and rehabilitation do not allow kinematic data to be collected automatically without the attachment of markers, controlled conditions and/or extensive processing times. These limitations can obstruct the routine use of motion capture in normal training or rehabilitation environments, and there is a clear desire for the development of automatic markerless systems. Such technology is emerging, often driven by the needs of the entertainment industry, and utilising many of the latest trends in computer vision and machine learning. However, the accuracy and practicality of these systems has yet to be fully scrutinised, meaning such markerless systems are not currently in widespread use within biomechanics. Conclusions: This review aims to introduce the key state-of-the-art in markerless motion capture research from computer vision that is likely to have a future impact in biomechanics, while considering the challenges with accuracy and robustness that are yet to be addressed.
Article
Full-text available
Objective: Sport research often requires human motion capture of an athlete. It can, however, be labour-intensive and difficult to select the right system, while manufacturers report on specifications which are determined in set-ups that largely differ from sport research in terms of volume, environment and motion. The aim of this review is to assist researchers in the selection of a suitable motion capture system for their experimental set-up for sport applications. An open online platform is initiated, to support (sport)researchers in the selection of a system and to enable them to contribute and update the overview. Design: systematic review; Method: Electronic searches in Scopus, Web of Science and Google Scholar were performed, and the reference lists of the screened articles were scrutinised to determine human motion capture systems used in academically published studies on sport analysis. Results: An overview of 17 human motion capture systems is provided, reporting the general specifications given by the manufacturer (weight and size of the sensors, maximum capture volume, environmental feasibilities), and calibration specifications as determined in peer-reviewed studies. The accuracy of each system is plotted against the measurement range. Conclusion: The overview and chart can assist researchers in the selection of a suitable measurement system. To increase the robustness of the database and to keep up with technological developments, we encourage researchers to perform an accuracy test prior to their experiment and to add to the chart and the system overview (online, open access).
Article
Full-text available
We present Motion2Fusion, a state-of-the-art 360 performance capture system that enables *real-time* reconstruction of arbitrary non-rigid scenes. We provide three major contributions over prior work: 1) a new non-rigid fusion pipeline allowing for far more faithful reconstruction of high frequency geometric details, avoiding the over-smoothing and visual artifacts observed previously. 2) a high speed pipeline coupled with a machine learning technique for 3D correspondence field estimation reducing tracking errors and artifacts that are attributed to fast motions. 3) a backward and forward non-rigid alignment strategy that more robustly deals with topology changes but is still free from scene priors. Our novel performance capture system demonstrates real-time results nearing 3x speed-up from previous state-of-the-art work on the exact same GPU hardware. Extensive quantitative and qualitative comparisons show more precise geometric and texturing results with less artifacts due to fast motions or topology changes than prior art.
Article
Full-text available
We present a novel technique that unobtrusively estimates interaction forces exerted by human participants in multi-contact interaction with rigid environments. Our method uses motion capture only, thus circumventing the need to setup cumbersome force transducers at all potential contacts between the human body and the environment. This problem is particularly challenging, as the knowledge of a given motion only characterizes the resultant force, which can generally be caused by an infinity of force distributions over individual contacts. We collect and release a large-scale dataset on how humans instinctively regulate interaction forces on diverse multi-contact tasks and motions. The force estimation framework we propose leverages physics-based optimization and neural networks to reconstruct force distributions that are physically realistic and compatible with real interaction force patterns. We show the effectiveness of our approach on various locomotion and multi-contact scenarios.
Article
Full-text available
We contribute a new pipeline for live multi-view performance capture, generating temporally coherent high-quality reconstructions in real-time. Our algorithm supports both incremental reconstruction, improving the surface estimation over time, as well as parameterizing the nonrigid scene motion. Our approach is highly robust to both large frame-to-frame motion and topology changes, allowing us to reconstruct extremely challenging scenes. We demonstrate advantages over related real-time techniques that either deform an online generated template or continually fuse depth data nonrigidly into a single reference model. Finally, we show geometric reconstruction results on par with offline methods which require orders of magnitude more processing time and many more RGBD cameras.
Article
Full-text available
The topic of this review is geometric registration in robotics. Registration algorithms associate sets of data into a common coordinate system. They have been used extensively in object reconstruction, inspection, medical application, and localization of mobile robotics. We focus on mobile robotics applications in which point clouds are to be registered. While the underlying principle of those algorithms is simple, many variations have been proposed for many different applications. In this review, we give a historical perspective of the registration problem and show that the plethora of solutions can be organized and differentiated according to a few elements. Accordingly, we present a formalization of geometric registration and cast algorithms proposed in the literature into this framework. Finally, we review a few applications of this framework in mobile robotics that cover different kinds of platforms, environments, and tasks. These examples allow us to study the specific requirements of each use case and the necessary configuration choices leading to the registration implementation. Ultimately, the objective of this review is to provide guidelines for the choice of geometric registration configuration.
Article
Full-text available
An approach for accurately measuring human motion through Markerless Motion Capture (MMC) is presented. The method uses multiple color cameras and combines an accurate and anatomically consistent tracking algorithm with a method for automatically generating subject specific models. The tracking approach employed a Levenberg-Marquardt minimization scheme over an iterative closest point algorithm with six degrees of freedom for each body segment. Anatomical consistency was maintained by enforcing rotational and translational joint range of motion constraints for each specific joint. A subject specific model of the subjects was obtained through an automatic model generation algorithm (Corazza et al. in IEEE Trans. Biomed. Eng., 2009) which combines a space of human shapes (Anguelov et al. in Proceedings SIGGRAPH, 2005) with biomechanically consistent kinematic models and a pose-shape matching algorithm. There were 15 anatomical body segments and 14 joints, each with six degrees of freedom (13 and 12, respectively for the HumanEva II dataset). The overall method is an improvement over (Mündermann et al. in Proceedings of CVPR, 2007) in terms of both accuracy and robustness. Since the method was originally developed for ≥8 cameras, the method performance was tested both (i) on the HumanEva II dataset (Sigal and Black, Technical Report CS-06-08, 2006) in a 4 camera configuration, (ii) on a series of motions including walking trials, a very challenging gymnastic motion and a dataset with motions similar to HumanEva II but with variable number of cameras.
Article
Full-text available
Human motion capture is frequently used to study musculoskeletal biomechanics and clinical problems, as well as to provide realistic animation for the entertainment industry. The most popular technique for human motion capture uses markers placed on the skin, despite some important drawbacks including the impediment to the motion by the presence of skin markers and relative movement between the skin where the markers are placed and the underlying bone. The latter makes it difficult to estimate the motion of the underlying bone, which is the variable of interest for biomechanical and clinical applications. A model-based markerless motion capture system is presented in this study, which does not require the placement of any markers on the subject's body. The described method is based on visual hull reconstruction and an a priori model of the subject. A custom version of adapted fast simulated annealing has been developed to match the model to the visual hull. The tracking capability and a quantitative validation of the method were evaluated in a virtual environment for a complete gait cycle. The obtained mean errors, for an entire gait cycle, for knee and hip flexion are respectively 1.5 degrees (+/-3.9 degrees ) and 2.0 degrees (+/-3.0 degrees ), while for knee and hip adduction they are respectively 2.0 degrees (+/-2.3 degrees ) and 1.1 degrees (+/-1.7 degrees ). Results for the ankle and shoulder joints are also presented. Experimental results captured in a gait laboratory with a real subject are also shown to demonstrate the effectiveness and potential of the presented method in a clinical environment.
Article
Full-text available
The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of `shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces
Article
Full-text available
Dynamic simulations of movement allow one to study neuromuscular coordination, analyze athletic performance, and estimate internal loading of the musculoskeletal system. Simulations can also be used to identify the sources of pathological movement and establish a scientific basis for treatment planning. We have developed a freely available, open-source software system (OpenSim) that lets users develop models of musculoskeletal structures and create dynamic simulations of a wide variety of movements. We are using this system to simulate the dynamics of individuals with pathological gait and to explore the biomechanical effects of treatments. OpenSim provides a platform on which the biomechanics community can build a library of simulations that can be exchanged, tested, analyzed, and improved through a multi-institutional collaboration. Developing software that enables a concerted effort from many investigators poses technical and sociological challenges. Meeting those challenges will accelerate the discovery of principles that govern movement control and improve treatments for individuals with movement pathologies.
Article
Background: Several factors, including the aging population and the recent corona pandemic, have increased the need for cost effective, easy-to-use and reliable telerehabilitation services. Computer vision-based marker-less human pose estimation is a promising variant of telerehabilitation and is currently an intensive research topic. It has attracted significant interest for detailed motion analysis, as it does not need arrangement of external fiducials while capturing motion data from images. This is promising for rehabilitation applications, as they enable analysis and supervision of clients’ exercises and reduce clients’ need for visiting physiotherapists in person. However, development of a marker-less motion analysis system with precise accuracy for joint identification, joint angle measurements and advanced motion analysis is an open challenge. Objectives: The main objective of this paper is to provide a critical overview of recent computer vision-based marker-less human pose estimation systems and their applicability for rehabilitation application. An overview of some existing marker-less rehabilitation applications is also provided. Methods: This paper presents a critical review of recent computer vision-based marker-less human pose estimation systems with focus on their provided joint localization accuracy in comparison to physiotherapy requirements and ease of use. The accuracy, in terms of the capability to measure the knee angle, is analysed using simulation. Results: Current pose estimation systems use 2D, 3D, multiple and single view-based techniques. The most promising techniques from a physiotherapy point of view are 3D marker-less pose estimation based on a single view as these can perform advanced motion analysis of the human body while only requiring a single camera and a computing device. Preliminary simulations reveal that some proposed systems already provide a sufficient accuracy for 2D joint angle estimations. Conclusions: Even though test results of different applications for some proposed techniques are promising, more rigour testing is required for validating their accuracy before they can be widely adopted in advanced rehabilitation applications. Keywords: Computer vision, marker-less, motion analysis, telerehabilitation
Article
Human pose estimation is a very active research field, stimulated by its important applications in robotics, entertainment or health and sports sciences, among others. Advances in convolutional networks triggered noticeable improvements in 2D pose estimation, leading modern 3D markerless motion capture techniques to an average error per joint of 20 mm. However, with the proliferation of methods, it is becoming increasingly difficult to make an informed choice. Here, we review the leading human pose estimation methods of the past five years, focusing on metrics, benchmarks and method structures. We propose a taxonomy based on accuracy, speed and robustness that we use to classify de methods and derive directions for future research.
Article
Human activity recognition (HAR) technology that analyzes data acquired from various types of sensing devices, including vision sensors and embedded sensors, has motivated the development of various context-aware applications in emerging domains, e.g., the Internet of Things (IoT) and healthcare. Even though a considerable number of HAR surveys and review articles have been conducted previously, the major/overallHAR subject has been ignored, and these studies only focus on particular HAR topics. Therefore, a comprehensive review paper that covers major subjects in HAR is imperative. This survey analyzes the latest state-of-the-art research in HAR in recent years, introduces a classification of HAR methodologies, and shows advantages and weaknesses for methods in each category. Specifically, HAR methods are classified into two main groups, which are sensor-based HAR and vision-based HAR, based on the generated datatype. After that, each group is divided into subgroups that perform different procedures, including the data collection, pre-processing methods, feature engineering, and the training process. Moreover, an extensive review regarding the utilization of deep learning in HAR is also conducted. Finally, this paper discusses various challenges in the current HAR topic and offers suggestions for future research
Article
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that using a PAF-only refinement is able to achieve a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.
Article
The advent of affordable consumer grade RGB‐D cameras has brought about a profound advancement of visual scene reconstruction methods. Both computer graphics and computer vision researchers spend significant effort to develop entirely new algorithms to capture comprehensive shape models of static and dynamic scenes with RGB‐D cameras. This led to significant advances of the state of the art along several dimensions. Some methods achieve very high reconstruction detail, despite limited sensor resolution. Others even achieve real‐time performance, yet possibly at lower quality. New concepts were developed to capture scenes at larger spatial and temporal extent. Other recent algorithms flank shape reconstruction with concurrent material and lighting estimation, even in general scenes and unconstrained conditions. In this state‐of‐the‐art report, we analyze these recent developments in RGB‐D scene reconstruction in detail and review essential related work. We explain, compare, and critically analyze the common underlying algorithmic concepts that enabled these recent advancements. Furthermore, we show how algorithms are designed to best exploit the benefits of RGB‐D data while suppressing their often non‐trivial data distortions. In addition, this report identifies and discusses important open research questions and suggests relevant directions for future work.
Article
We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.
Article
With the rapid development of computing technology, three-dimensional (3D) human body models and their dynamic motions are widely used in the digital entertainment industry. Human performance mainly involves human body shapes and motions. Key research problems in human performance animation include how to capture and analyze static geometric appearance and dynamic movement of human bodies, and how to simulate human body motions with physical effects. In this survey, according to the main research directions of human body performance capture and animation, we summarize recent advances in key research topics, namely human body surface reconstruction, motion capture and synthesis, as well as physics-based motion simulation, and further discuss future research problems and directions. We hope this will be helpful for readers to have a comprehensive understanding of human performance capture and animation.
Article
We present a method to create personalized anatomical models ready for physics-based animation, using only a set of 3D surface scans. We start by building a template anatomical model of an average male which supports deformations due to both 1) subject-specific variations: shapes and sizes of bones, muscles, and adipose tissues and 2) skeletal poses. Next, we capture a set of 3D scans of an actor in various poses. Our key contribution is formulating and solving a large-scale optimization problem where we compute both subject-specific and pose-dependent parameters such that our resulting anatomical model explains the captured 3D scans as closely as possible. Compared to data-driven body modeling techniques that focus only on the surface, our approach has the advantage of creating physics-based models, which provide realistic 3D geometry of the bones and muscles, and naturally supports effects such as inertia, gravity, and collisions according to Newtonian dynamics.
Conference Paper
This paper presents a marker-less method for full body human performance capture by analyzing shading information from a sequence of multi-view images, which are recorded under uncontrolled and changing lighting conditions. Both the articulated motion of the limbs and then the fine-scale surface detail are estimated in a temporally coherent manner. In a temporal framework, differential 3D human pose-changes from the previous time-step are expressed in terms of constraints on the visible image displacements derived from shading cues, estimated albedo and estimated scene illumination. The incident illumination at each frame are estimated jointly with pose, by assuming the Lambertian model of reflectance. The proposed method is independent of image silhouettes and training data, and is thus applicable in cases where background segmentation cannot be performed or a set of training poses is unavailable. We show results on challenging cases for pose-tracking such as changing backgrounds, occlusions and changing lighting conditions.
Article
We present a system to reconstruct subject-specific anatomy models while relying only on exterior measurements represented by point clouds. Our model combines geometry, kinematics, and skin deformations (skinning). This joint model can be adapted to different individuals without breaking its functionality, i.e., the bones and the skin remain well-articulated after the adaptation. We propose an optimization algorithm which learns the subject-specific (anthropometric) parameters from input point clouds captured using commodity depth cameras. The resulting personalized models can be used to reconstruct motion of human subjects. We validate our approach for upper and lower limbs, using both synthetic data and recordings of three different human subjects. Our reconstructed bone motion is comparable to results obtained by optical motion capture (Vicon) combined with anatomically-based inverse kinematics (OpenSIM). We demonstrate that our adapted models better preserve the joint structure than previous methods such as OpenSIM or Anatomy Transfer.
Article
We present a fast, automatic method for accurately capturing full-body motion data using a single depth camera. At the core of our system lies a realtime registration process that accurately reconstructs 3D human poses from single monocular depth images, even in the case of significant occlusions. The idea is to formulate the registration problem in a Maximum A Posteriori (MAP) framework and iteratively register a 3D articulated human body model with monocular depth cues via linear system solvers. We integrate depth data, silhouette information, full-body geometry, temporal pose priors, and occlusion reasoning into a unified MAP estimation framework. Our 3D tracking process, however, requires manual initialization and recovery from failures. We address this challenge by combining 3D tracking with 3D pose detection. This combination not only automates the whole process but also significantly improves the robustness and accuracy of the system. Our whole algorithm is highly parallel and is therefore easily implemented on a GPU. We demonstrate the power of our approach by capturing a wide range of human movements in real time and achieve state-of-the-art accuracy in our comparison against alternative systems such as Kinect [2012].
Conference Paper
This paper presents a novel method to simultaneously estimate the clothed and naked 3D shapes of a person. The method needs only a single photograph of a person wearing clothing. Firstly, we learn a deformable model of human clothed body shapes from a database. Then, given an input image, the deformable model is initialized with a few user-specified 2D joints and contours of the person. And the correspondence between 3D shape and 2D contours is established automatically. Finally, we optimize the parameters of the deformable model in an iterative way, and then obtain the clothed and naked 3D shapes of the person simultaneously. The experimental results on real images demonstrate the effectiveness of our method.
Article
This paper describes a general purpose, representation independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six-degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and experience shows that the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. For examples, a given 'model' shape and a sensed 'data' shape that represents a major portion of the model shape can be registered in minutes by testing one initial translation and a relatively small set of rotations to allow for the given level of model complexity. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model prior to shape inspection. The described method is also useful for deciding fundamental issues such as the congruence (shape equivalence) of different geometric representations as well as for estimating the motion between point sets where the correspondences are not known. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces.
Iterative Closest Point Alignment Add-on for Blender
  • C Gearhart