ABSTRACT: Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting persons is a very challenging task, even in a multi-camera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. In order to address this task, we propose a framework that exploits multi-view image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one-by-one to each individual, followed with surface estimation to capture detailed non-rigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multi-person motions, including dancing, wrestling, and hugging.
IEEE Transactions on Software Engineering 02/2013; · 1.98 Impact Factor
IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6-13, 2011; 01/2011
ACM Transactions on Graphics 01/2011; 30:32. · 3.49 Impact Factor