Conference PaperPDF Available

Virtual Fashion Show Using Real-Time Markerless Motion Capture

Authors:
  • Rakuten Institute of Technology

Abstract and Figures

This paper presents a motion capture system using two cam- eras that is capable of estimating a constrained set of human postures in real time. We first obtain a 3D shape model of a person to be tracked and create a posture dictionary consisting of many posture examples. The posture is estimated by hierarchically matching silhouettes gener- ated by projecting the 3D shape model deformed to have the dictionary poses onto the image plane with the observed silhouette in the current image. Based on this method, we have developed a virtual fashion show system that renders a computer graphics-model moving synchronously to a real fashion model, but wearing dierent clothes.
Content may be subject to copyright.
Virtual Fashion Show Using Real-time
Markerless Motion Capture
Ryuzo Okada1, Bj¨orn Stenger1, Tsukasa Ike1, and Nobuhiro Kondoh2
1Corporate Research & Development Center, Toshiba Corporation
2Semiconductor Company, Toshiba Corporation
ryuzo.okada@toshiba.co.jp
Abstract. This paper presents a motion capture system using two cam-
eras that is capable of estimating a constrained set of human postures
in real time. We first obtain a 3D shape model of a person to be tracked
and create a posture dictionary consisting of many posture examples.
The posture is estimated by hierarchically matching silhouettes gener-
ated by projecting the 3D shape model deformed to have the dictionary
poses onto the image plane with the observed silhouette in the current
image. Based on this method, we have developed a virtual fashion show
system that renders a computer graphics-model moving synchronously
to a real fashion model, but wearing different clothes.
1 Introduction
In a virtual fashion show application the goal is to animate a computer-graphics
(CG) model in real-time according to the motion of the real person, while the CG
model is wearing a costume different from the actual clothes of the real model.
Essentially this task requires an efficient technique for human motion capture
with real-time estimation capability.
Currently available commercial motion capture systems require markers or
sensors attached to a person. In our system we want to avoid use of visible
markers and sensors because fashion models are watched by audiences and we
think this is important for variety of motion capture applications in the case
of home or office use. One well known approach to vision-based motion capture
uses space-carving methods. The shape of a target person is obtained as the
intersection of 3D regions generated by inverse projection of silhouettes. This
technique [1, 2] requires relatively clean silhouette data obtained from many
cameras surrounding the person to be captured. Many approaches that makes
use of a 3D shape model of the human body have also been proposed, such as
matching feature extracted from captured image and that from the projected 3D
shape model [3, 4], learning direct mapping from image features to 3D body pose
parameters [5], and defining the force that moves the 3D model to the extracted
image feature [6]. These method works with a small number of cameras, but many
problems such as stability over long sequences, accuracy, and computational cost
remain to be solved. Choosing suitable image feature, such as silhouette [5–7],
Best match
Posture
dictionary
3-D body shape data Motion data
Real scene
target
Camera
1Camera
2
3-D position
Estimated posture
Compare
Silhouette extraction
Generating
various postures
Fig. 1. Overview of the motion capture method
depth [4], and edge [3], depending on an individual target application is one of the
important issues. Another problem is how to search for the optimal posture in the
high-dimensional parameter space. Real-time motion capture has been achieved
using incremental tracking, however, in this case the problem of initial posture
estimation needs to be solved [8], and often estimation errors can accumulate over
long image sequences [9]. The highly nonlinear relationship between similarity
and posture parameters further complicates the problem. In order to address
this, versions of particle filtering have been suggested [10, 11], which have been
shown to yield good results, given manual initialization and off-line processing.
Part-based methods [12] or the use of inverse kinematics [13] may be able to solve
the initialization problem and reduce the computational cost of the estimation.
However, these methods require the localization of individual body parts, which
is difficult in cases where self-occlusion occurs and there are few cameras.
The virtual fashion show application requires real-time processing for syn-
chronizing the motion between the real fashion model and the CG fashion model.
Some conditions appropriate for this application can simplify the problem for
achieving real-time posture estimation. First, the type of motion is restricted
and known beforehand because the motion of the real fashion model is limited
to walking and several types of posing. In our setting the fashion model can be
required to wear clothes that tightly fit the body, making silhouette matching
possible, whose simple definition of cost function also contributes to real-time
processing. We are also able to obtain an individual 3D body shape model using
a 3D body scanner, as well as posture sequences obtained by a marker-based
motion capture system. These data are used to generate a posture dictionary
off-line (see section 2 and Fig. 1). Our posture estimation method consists of
global position estimation (see section 3) and local pose estimation (see section
4) based on silhouette matching between the observed foreground silhouette and
the candidate silhouettes generated from the posture dictionary. We show track-
ing results and a performance evaluation of posture estimation in section 5 and
describe a virtual fashion show in section 6.
2 Posture dictionary
The 3D body shape model is obtained using a laser 3D shape scanner. The num-
ber of polygons is reduced from two million to 2000 by deleting vertexes having
small curvatures manually in order to achieve a low computational time for sil-
houette projection. For 640 ×480 images the time is 1.2–2.0 ms per silhouette
projection on a standard PC. The kinematics of the human body are commonly
represented by a tree structure whose top node is the body center. Local coordi-
nate systems are defined relative to each body part corresponding to the parent
node in the tree structure.
A commercial marker-based motion capture system is used to collect a variety
of postures, including walking, posing and turning. A posture captured by the
marker-based motion capture system is represented in terms of rotation angles
and translation vectors for each joint, which are the parameters to transform
a local coordinate system to that of its parent node. Note that the translation
parameters are constant except for the body center because the lengths of the
body parts do not change, and the parameters of the body center stand for
transformation between the local coordinate of the body center and the world
coordinate. We call the set of rotation parameters for a posture pa posture
vector, which is denoted by rp= (rp1,· · · , rp(3Nb)), where Nb= 21 represents
the number of joints.
Due to periodic motion, some poses are very similar, and similar postures are
represented by prototype, found by greedy clustering, based on the difference
d1(a, b) between postures aand b:
d1(a, b) = max
i=1,···,3Nb
|(rai rbi) mod π|,(1)
which is the largest angle difference of all the rotation parameters. As a result
of the clustering, the distances d1between any two prototypes are larger than a
threshold, which is 7 degrees in our experiments.
3 Global position estimation
For estimating the global body position in the 3D scene, we track the target
person in two camera views independently based on our previously proposed
tracking algorithm [14]. The algorithm enables us to stably track an object in
an image sequence captured at a high frame rate as the motion in the image is
very small. In our experiments a frame rate of 100 fps is used. The algorithm
consists of corner point tracking and outlier elimination using an affine motion
model, and estimates the target position in the image as the mean location of
the tracked corner points (see Fig. 2(a)).
Next, we compute the global position of the body center in the world coordi-
nate system by triangulation of the two calibrated cameras using the estimated
target positions in the images. The postures that we estimate in the virtual fash-
ion show are all upright, so that the body center moves almost parallel to the
Camera 1
Camera 2
H
O1O2
Pg2
Pg1
G
l1
l2
Image plane
of camera 1 Image plane
of camera 2
hbFloor
Projection : PH
t(P
g1
-O
1
)
+
O
1
-1000
0
1000
2000
3000
4000
0 1000 2000 3000
X [mm]
Z [mm]
Posing 1
Posing 2
Estimated 3-D
positions
Walking
direction
Camera 1
Camera 2
Camera 3
Camera 4
(a) Results of feature
point tracking
(b) Global position esti-
mation
(c) Top view of estimated
global positions
Fig. 2. Estimation of global positions of a target person. The tracked feature points
are indicated by the white ‘+’ marks on the original images in figure (a). The white
rectangle is the tracking window, which is the minimum rectangle containing all feature
points. The large white ‘+’ marks are the mean positions of the feature points, which is
the estimated target position in the image (Pg1and Pg2), and the white line segment
attached to it is the estimated motion vector
floor and the height is approximately fixed to a constant hb, the height of the
body center in standing pose. As shown in Fig. 2(b), we project a line passing
through both the camera center Ocand the target position Pgc in the image
plane onto the plane H, parallel to the floor with distance hb, and denote the
projected line by lc. Assuming that the XZ-plane of the world coordinate system
corresponds to the floor, lcis expressed as follows:
lc={PH(t(Pgc Oc) + Oc)|tR}, c ∈ {1,2},PH=µ1 0 0 0
000hb
0 0 1 0
0 0 0 1 ,(2)
where PHdenotes the projection matrix onto the plane H. The global position
Gof the target is the point of intersection of the projected lines l1and l2.
Fig. 2(c) shows results of global position estimation. The target person walks
along the Z-axis at X= 1500 and poses at Z= 500 and Z= 4000 shifting
the body weight in the X-direction. In this experiment, two pairs of cameras
are used to cover the entire area, but one of them is used for global position
estimation at each time instance. The area covered by each pair of cameras is
determined beforehand and the pair of cameras is selected when the estimated
global position is in its predetermined area.
4 Posture estimation
We perform the posture estimation procedure at every fourth frame, i.e. at 25
fps, because the computational cost of the posture estimation is much higher
Candidate silhouette
Sp(c,m)
Observed silhouette
So(c)
Inclusion region R(c)
(a) Extracted Silhouette (b) Similarity
Fig. 3. Silhouette extraction and similarity computation
than that of global position estimation. First, candidate postures that are in the
neighborhood of the posture in the previous frame are selected as candidates
to restrict the search: We select postures that have similar joint angles to the
previous posture p, i.e. the distance d1(p, m) between pand a selected posture
mis smaller than a threshold (60 degrees in our experiments). Since the sim-
ilarity is defined in terms of silhouette difference in the image (see section 4.1
for details), we impose a further restriction on the number of postures based
on an appearance-based distance: We define such an appearance-based posture
difference, d2(a, b), using the positions of joints projected onto the image plane
for fast computation as
d2(a, b) = max
i=1,···,Nb
|pai pbi|,(3)
where pai and pbi denote the positions of joints in the image that are obtained by
orthogonal projection of the 3D joint coordinates. We sort the postures selected
by d1(p, m) based on the appearance-based distance d2(p, m), and select the first
npostures as the set Mof candidate postures. We use n= 60 in our experiments.
The silhouettes of each candidate posture min the set Mare generated by (1)
translating the 3D body shape model to the estimated global position Gin order
to correspond the size of the silhouette with that of the observed silhouettes, (2)
deforming the 3D body shape model to assume the pose m, and (3) projecting
the polygons of the deformed 3D body shape model into each camera view.
The observed silhouette is extracted using background subtraction (see Fig. 3(a)).
This often results in noisy silhouettes, but has proved sufficiently stable in our
application with reasonably stable lighting conditions.
4.1 Similarity of silhouettes
As shown in Fig. 3(b), Sp(c, m) and So(c) denote a candidate silhouette obtained
from the candidate posture m, and an observed silhouette for a camera c.R(c)
represents the smallest rectangle that contains all candidate silhouettes. The
similarity of the silhouettes, Sp(c, m) and So(c), should be high when the area of
the observed silhouette is large in the candidate silhouette and is small outside
the candidate silhouette. Thus, we use the difference between the occupancy rate
of the observed silhouette in the candidate silhouette ρi(c, m) and that outside
the candidate silhouette ρo(c, m) for the similarity normalized with the area of
the silhouette:
ρi(c, m) = |Sp(c, m)So(c)|
|Sp(c, m)|, ρo(c, m) = |Sp(c, m)R(c)So(c)|
|Sp(c, m)R(c)|,(4)
where |·|represents the area of a region.
The similarity measure is affected by the estimation error of the global posi-
tion. It is therefore necessary to perform optimization for both posture and local
shift of the global position. We shift the candidate silhouette in each camera view
with a shift d, and maximize the similarities independently for each camera in
order to optimize the global position locally. Thus, we redefine the similarity for
a posture mas
s(m) = X
c
max
dD(ρi(c, m, d)ρo(c, m, d)),(5)
where ρi(c, m, d) and ρo(c, m, d) denote the occupancy rate using a candidate
silhouette shifted with a shift din the range of shifts D.
4.2 Hierarchical posture search
In order to reduce the computational cost of searching for the posture with
the greatest similarity, we adopt a coarse-to-fine strategy using a two-level tree,
which is generated on-line for each frame. The first layer of the search tree
consists of postures selected from Mat every t-th posture and the rest of the
candidate postures are attached to the closest posture in the first layer as pos-
tures in the second level. We search for the optimal posture using the search tree
as follows: (1) compute the similarity based on eq. (5) for the postures on the
first level of the tree, (2) select the kpostures with the greatest similarity, (3)
compute the similarity for the postures on the second level in the subtrees of the
kselected postures, and (4) select the posture that has the greatest similarity.
We use t= 3 and k= 3 in our experiments.
4.3 Initialization
If a sufficiently large silhouette is extracted in the current image based on the
background subtraction, we set the observed silhouette to be the initial target
region and start tracking based on our object tracking algorithm [14].When the
tracking results come from two or more cameras for the first time, we compute
the initial global position, and start the posture estimation with suitable initial
posture. Although the initial posture does not fit completely to the posture of
the target person, the estimated posture gradually fits to the target person in
the subsequent frames.
frame 130 frame 150 frame 220
(a) Results of posture estimation for subject N viewed from camera 3
frame 30 frame 80 frame 130
(b) Results of posture estimation for subject S viewed from camera 2
Fig. 4. Results of posture estimation using four cameras. Estimated postures are in-
dicated by white contour lines on the original image. The 3D figures shown in gray
beside the contours represent postures with smoothed motion for CG animation
5 Experiments
Fig. 4 shows the results of posture estimation using four cameras for two subjects
who walk and pose differently. In each case the camera arrangement is the same
as that shown in Fig. 2(c) and the cameras capture gray-scale images with a
resolution of 640x480 pixels at a frame rate of 100 fps. Note that the frame rate
for posture estimation is 25 fps as described in section 4. The posture sequence
for each subject obtained by the marker-based motion capture system contains
about 3600 postures captured at a rate of 120 fps. After clustering the posture
vectors the dictionary for each subject consists of about 500 postures, which are
experimentally sufficient for our restricted set of motions in the virtual fashion
show. Fig. 4 shows that postures are correctly estimated in most frames. In some
frames, such as frame 150 in Fig. 4(a), however, the contour lines showing the
estimated postures are incorrect.
We have conducted experiments on 27 image sequences for evaluating the
performance of the posture estimation. The image sequences include four types
of motion shown in Fig. 5(b) performed by three subjects. We use an individual
3D body shape model and motion data for each subject obtained by a laser 3D
shape scanner and a commercial marker-based motion capture system, respec-
tively. Table 1 shows the number of misestimations and the number of frames
in which misestimation occurs. The number of misestimations, e.g. frame 150
in Fig. 4(a), is counted by comparing the estimate to the ground truth, where
postures with small alignment errors are not counted as a misestimation. The
misestimation often occurs for a particular subject H compared to the other sub-
Scenario # Sequence Frames Failures Failure frames Error in %
S-M1 4 1318 4 12 0.9
S-M2 3 1120 4 87 7.8
S-M3 3 1017 1 17 1.7
S-M4 4 1414 5 37 2.6
H-M2 4 1593 7 127 8.0
H-M3 4 1475 6 64 4.3
N-M1 3 924 6 62 6.7
N-M4 2 694 1 5 0.7
Table 1. Performance evaluation. The first column represents the type of scenario.
For example, S-M1 stands for motion sequence M1 performed by subject S. The sec-
ond column is the number of sequences, which are used in the experiments, and the
third column is the total number of frames. Columns four to six show the number of
misestimations, the number of frames in which misestimation occurs and the error rate
jects. This is because her postures in the image sequences are not contained in
the posture dictionary. In total misestimation occurs 34 times for 27 sequences,
on average about 1.3 times per sequence (0.089 times per second), correspond-
ing to 4.3 % of the total number of frames. Although we have restricted the
search space for posture estimation by selecting candidate postures similar to
the previous estimated posture, misestimation occurs for a short period. Such
temporal jitter can be reduced by temporal filtering. In our system smooth mo-
tion is generated based on the posture sequence recorded by a marker-based
motion capture system (see section 6.1). Another reason for the misestimations
is the fact that the extracted silhouettes can be very noisy due to shadows on
the floor.
6 Virtual fashion show
We have developed a virtual fashion show system using our motion estimation
method described in sections 2–4. Fig. 5 shows an overview of the system. A
fashion model walks and poses on the stage according to four types of scenarios
shown in Fig. 5(b), and our motion capture system estimates her posture. While
the fashion model walks along the stage, she poses twice at different positions
according to the scenario. Two large projector screens display a full-CG model
wearing a costume different from the actual clothes, based on clothes simulation
and CG techniques.
6.1 Smooth motion generation
As described in section 5, misestimation of the posture occurs at a certain rate.
Even when the posture is correctly estimated, the estimated motion, which is
the time series of the estimated postures, is not smooth because the estimated
postures can be slightly misaligned. These problems are critical for generating
High-speed cameras
Stage
Overview
Projector screens
(a) Virtual fashion show set up (b) Examples of scenarios
Fig. 5. Overview of the virtual fashion show. Three pairs of cameras are placed along
both sides of the stage. Two projectors show the CG model wearing clothes different
from the actual model as shown in the left images of figure (b)
natural motion of the CG model in the virtual fashion show. We thus combine
the estimated motion with the motion data recorded with the marker-based
motion capture system.
The recorded motion sequence contains all the postures in the same order as
the motion of a real fashion model, except for the timing of walking and posing.
We generate smooth motion by changing the playing speed of the recorded pos-
ture sequence according to the estimated posture. We start playing the recorded
posture sequence when the current estimated posture eis similar to the posture
iin the first frame of the recorded posture sequence in terms of the posture
difference d1(e, i).
The 3D figures shown in gray in Fig. 4 represent postures generated by the
smooth motion generation method. In the 150th frame in Fig. 4(a) where the
posture is misestimated, the motion model finds a plausible posture, even if
the silhouette of the estimated posture is slightly misaligned with the observed
silhouette. While this smooth motion generation method is straightforward and
effective for a specific application of the virtual fashion show, accurate posture
estimation and a universal motion generation method are necessary for general
applications.
6.2 Hardware configuration
We place three high-speed cameras on each side of the stage (six cameras in
total) in order to cover the entire stage which measures about 10 m ×3 m. Each
high-speed camera is connected to a PC mounting dual Xeon 3.0 GHz CPUs
that captures images and tracks the fashion model in the images as described
in section 3. The captured images and the tracking results are transfered to two
PCs for posture estimation mounting quad Itanium 2 1.6 GHz CPUs through a
high-speed network Myrinet, and the two PCs compute the global position and
estimate the posture with smooth motion generation. The estimated posture is
sent to a PC for clothes simulation and CG rendering through Gb Ethernet, and
the generated CG animation is displayed on two projector screens.
7 Conclusions and future work
We have presented a real-time motion capture system using pairs of cameras
and have demonstrated that the system works efficiently for a virtual fashion
show based on several constraints appropriate for the virtual fashion show, such
as known body shape, tight fitting clothes and limited types of motion.
A possible future application is a virtual try-on for online clothes shopping.
However, in order to make this approach work in more general settings, some
issues that need to be considered are automatic 3D body shape model acquisi-
tion, the use of more robust image features, and efficient matching techniques
for increasing the number of postures in the posture dictionary.
References
1. Matsuyama, T., Wu, X., Takai, T., Wada, T.: Real-time dynamic 3D object shape
reconstruction and high-fidelity texture mapping for 3D video. IEEE Trans. on
Circuits and Systems for Video Technology 14 (2004) 357–369
2. Cheung, G., Baker, S., Kanade, T.: Shape-from-silhouette of articulated ob jects
and its use for human body kinematics estimation and motion capture. In: Proc.
of CVPR. Volume 1. (2003) 77–84
3. Gavrila, D., Davis, L.: 3d model-based tracking of humans in action: A multi-view
approach. In: Proc. of CVPR. (1996) 73–80
4. Pl¨ankers, R., Fua, P.: Tracking and modeling people in video sequences. Computer
Vision and Image Understanding 81 (2001)
5. Agarwa, A., Triggs, B.: 3d human pose from silhouettes by relevance vector re-
gression. In: Proc. of CVPR. Volume 2. (2004) 882–888
6. Delamarre, Q., Faugeras, O.: 3d articulated models and multi-view tracking with
silhouettes. In: Proc. of ICCV. Volume 2. (1999) 716–721
7. Brand, M.: Shadow puppetry. In: Proc. of ICCV. (1999) 1237–1244
8. Senior, A.: Real-time articulated human body tracking using silhouette informa-
tion. In: Proc. of IEEE Workshop on Visual Surveillance/PETS. (2003) 30–37
9. Yamamoto, M., Ohta, Y., Yamagiwa, T., Yagishita, K., Yamanaka, H., Ohkubo,
N.: Human action tracking guided by key-frames. In: Proc. of FG. (2000) 354–361
10. Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed
particle filtering. In: Proc. of CVPR. Volume 2. (2000) 1144–1149
11. Sminchisescu, C., Triggs, B.: Estimating articulated human motion with covariance
scaled sampling. IJRR 22 (2003) 371–391
12. Felzenszwalb, P., Huttenlocher, D.: Efficient matching of pictorial structures. In:
Proc. of CVPR. Volume 2. (2000) 66–73
13. Date, N., et al.: Real-time human motion sensing based on vision-based inverse
kinematics for interactive applications. In: Proc. of ICPR. Volume 3. (2004) 318–
321
14. Okada, R., et al.: High-speed object tracking in ordinary surroundings based on
temporally evaluated optical flow. In: Proc. of IROS. (2003) 242–247
... To implement virtual fitting rooms and virtual fashion shows, computer graphic (CG) technology is required to match a real body and a simulated body. However, it is more difficult to implement this technology for virtual fashion shows because the motion of the virtual model must be synchronized with an actual model rather than merely scanning the visual body shape or mimicking the simple motion of an actual model into a virtual model (Okada et al., 2006). Thus, it is important for virtual fitting rooms and fashion shows to improve their synchronization capability. ...
... To successfully improve the consumer experience using VFS technology, several issues should be understood and addressed by retailers. Because VFS is highly dependent on the tools to simulate and reflect the motions and physical characteristics, it requires advanced technology to make realtime processing synchronizing the motions between a real model and CG model (Okada et al., 2006). Motions of a CG model in realtime should match with the motions of a real person, while a CG model wears products differently from the actual clothes of a real model. ...
... Current techniques have limitations because they require markers or sensors attached to a person. Better commercial human motion capture systems, which have real-time estimation capability, are on demand to fully transfer motions and physical characteristics (Okada et al., 2006). The investment can be very high for retailers, and currently there is no data to support that the return will be large enough to compensate for the investment. ...
Article
Full-text available
With increasing demand of consumers for better shopping experiences and an increased number of online retailers, it is important for Omni-channel retailers to adopt and utilize several virtual technologies, which can support and complement their retailing practices. To strategically manage these technologies, it is imperative to analyze several specific cases, which utilize virtual technologies to align and implement multiple technologies in effective and efficient ways, and to make synergy among those technologies. The purpose of this paper is twofold. First, to examine how Omni-channel retailers utilize and manage several virtual technologies, which are virtual/augmented reality, virtual fitting rooms and fashion shows, and virtual salespeople, to provide satisfactory online shopping experience by overcoming problems online environments fundamentally have. Second, to provide practical implications to brick-and-mortar retailers, who have recently ventured into online retailing, in their management of various types of channels simultaneously using technologies like the Internet of Things and Kiosks. Through the review and analysis of several cases, this paper provides managerial implications for practitioners to utilize virtual technologies in ways that can actually add value to consumer experiences and urge them to take a mixed approach toward virtual technology.
... Some scholars have devised solutions to these problems based on 3D and VR technologies. Virtual draping technology (Magnenat-Thalmann and Volino, 2005; In Hwan and Tae Jin, 2006;Mesuda et al., 2015) was developed to cut down fabric usage; virtual fashion design (Hinds and McCartney, 1990;Hardaker and Fozzard, 1998;Jin et al., 2009;Decaudin et al., 2006;Robson et al., 2011;Tao and Bruniaux, 2013) was developed to improve design efficiency; virtual pattern design (Rodel et al., 2001;Petrak et al., 2006a, b;Jeong et al., 2006;Yunchu and Weiyuan, 2007;Cho et al., 2010;Huang et al., 2012) was developed to facilitate pattern making; virtual try-on technology (Kim and Forsythe, 2008;Meng et al., 2010) was developed to simulate clothing sewing for reducing labor cost; virtual fashion show technology (Stylios et al., 1996;Okada et al., 2006) was developed to display fashion styles online. All of these applications of 3D and VR technology need a 3D digital human model (DHM). ...
Article
Purpose The purpose of this paper is that propose a relatively simple and rapid method to create a digital human model to serve clothing industry. Design/methodology/approach Human body’s point cloud is divided into hands, foots, head and torso. Then Forward Modeling Method (FMM) is used to model hands and foots; Picture Modeling Method (PMM) is used to model head; Reverse Modeling Method (RMM) is used to model torso. After that, hands, foots, head and torso are integrated together and get a static avatar. Next, virtual skeleton is bound the avatar. Finally, a lifelike digital human body model is created by the Mixed Modeling Method (MMM). Findings In allusion to the defect of the three-dimension original data of human body, this paper presented a mixed modeling method, with which we can get a realistic digital human body model with accurate body dimensions. The digital human model can well meet the needs of fashion industry. Practical implications The digital human model, which is got by our mixed modeling method, can be well applied in the field of virtual try on, virtual fashion design, virtual fashion show and so on. Originality/value Integrating of forward modeling, reverse modeling and photo modeling to present a novel method of human body modeling.
... Finally, [13] develops a motion capture system using two cameras that is capable of estimating a constrained set of human postures in real time. They first obtain a 3D shape model of a person to be tracked and create a posture dictionary consisting of many posture examples. ...
Conference Paper
Social media response to catastrophic events, such as natural disasters or terrorist attacks, has received a lot of attention. However, social media are also extremely important in the context of planned events, such as fairs, exhibits, festivals, as they play an essential role in communicating them to fans, interest groups, and the general population. These kinds of events are geo-localized within a city or territory and are scheduled within a public calendar. We consider a specific scenario, the Milano Fashion Week (MFW), which is an important event in our city.
... Downloaded from PubFactory at 08/13/2016 03:01:09PM via free access hand, output of the device has to be filtered because of its relatively low and varying precision. This issue could be solved using more sensors operating through inter-process communication on one computer or through a local area network [9]. As we consider using multiple sensors in the future, we decided to utilize the first version of the Kinect sensor. ...
Article
Full-text available
This paper deals with particular issues in detection of human body’s movement using contactless sensor and consecutive application of recorded data on avatars model.We aimed at applying transformations to the skeleton model and skin of the avatar. Initial stages include choice of proper sensor and appropriate methods for transformation description. A semi-immersive virtual reality training system in the form of a fitting room has been chosen as a particular implementation of the body movement detection and application. The fitting room was selected because of its entertaining nature and a potential to be used as a training system for handicapped people with movement impairments.
... On the other hand, output of the device has to be filtered because of its relatively low and varying precision. This issue could be solved using more sensors operating through inter-process communication on one computer or through a local area network [9]. As we consider using multiple sensors in the future, we decided to utilize the first version of the Kinect sensor. ...
Article
Full-text available
The standard philosophical analysis of counterfactual conditionals – the Lewis-Stalnaker analysis – analyzes the truth-conditions of counterfactuals in terms of nearby possible worlds. This paper demonstrates that this analysis is false. Section 1 shows that it is a serious epistemic and metaphysical possibility that our “world” is a massive computer simulation, and that if the Lewis-Stalnaker analysis of counterfactuals is correct, then it should extend seamlessly to the case that our world is a computer simulation, in the form of a possible-simulation semantics. Section 2 then shows that a Lewis-Stalnaker-style possible-simulation semantics clearly fails as an analysis of the truth-conditions of counterfactuals in two types of simulated worlds: Humean Simulations and Necessitarian simulations. Section 3 then considers and answers several objections. Finally, Section 4 draws several skeptical lessons about counterfactuals.
... Motion Capture techniques allow motion, usually human, to be recorded, for presentation and analysis, and associated technologies are becoming increasingly sophisticated, accurate and portable. The main markets that benefit from motion capture today are medicine, sports [1], entertainment [2], and law/surveillance [3], but there are smaller markets that are also taking advantage of the technology; for example, motion capture equipment is used to help design ergonomic environments [4]. Additional uses include automobile safety tests when the motion of crash test dummies is captured and analysed [5]. ...
Article
Full-text available
This paper describes motion capture work applied to ballet gestures. A wireless gyroscopic sensor based system is used to record the movements of ballet dancers in order to establish a suitable workflow for extreme human movements and to investigate if such equipment is capable of detecting misalignments in ankle position. Results show that subtle variations in joint rotation can be clearly measured and could be used to develop fault detection and isolation possibilities for injury prevention and improved performance. The limits of the current techniques and some improvements are discussed, for real-time performance feedback.
Conference Paper
Social media response to catastrophic events, such as natural disasters or terrorist attacks, has received a lot of attention. However, social media are also extremely important in the context of planned events, such as fairs, exhibits, festivals, as they play an essential role in communicating them to fans, interest groups, and the general population. These kinds of events are geo-localized within a city or territory and are scheduled within a public calendar. We consider a specific scenario, the Milano Fashion Week (MFW), which is an important event in our city.
Chapter
When purchasing clothes online, consumers usually have to imagine what they might look like when wearing them in real life surroundings. This can hinder their ability to make appropriate purchasing decisions. In order to alleviate this uncertainty, we propose using augmenting fashion show, a new immersive experience to enable consumers to create personalized 3D models of themselves and have them fitted with a variety of purchasable clothing using Augmented Reality. Furthermore, their models are enhanced with animations (walking and a variety of poses) depending on the real environment. In this way, consumers can see a representation of themselves wearing a variety of clothes, and get a sense of what they would look like in real life surroundings. A prototype of our system was demonstrated, and a preliminary evaluation was conducted to verify our system. We have received positive feedback in terms of effectiveness, assistance and potential application.
Conference Paper
This paper deals with computer graphics, human body detection and application of computer graphics in avatar control. The main goal of this work is to design and create virtual training system including avatar using contactless sensors. Authors focus on the control of the humanoid avatar by applying transformations to his skeleton. To achieve this, proper sensor is chosen. Avatar is able to perform movements that can be reproducible by script generated data during training. Developed system implements all requirements of virtual fitting room in the scope of case study. This system can be used for better understanding of some moving problems of handicapped people.
Article
We aim at automatically capturing 3D motion of persons without markers. To make it flexible, and to consider interactive applications, we address real-time solution, without specialized instrumentation. Real-time body estimation and shape analyze lead to home motion capture application. We begin by addressing the problem of 3D real-time reconstruction of moving objects from multiple views. Existing approaches often involve complex computation methods, making them incompatible with real-time constraints. Shape-From-Silhouette (SFS) approaches provide interesting compromise between algorithm efficiency and accuracy. They estimate 3D objects from their silhouettes in each camera. However they require constrained environments and cameras placement. The works presented in this document generalize the use of SFS approaches to uncontrolled environments. The main methods of marker-less motion capture, are based on parametric modeling of the human body. The acquisition of movement goal is to determine the parameters that provide the best correlation between the model and the 3D reconstruction.The following approaches, more robust, use natural markings of the body extremities: the skin. Coupled with a temporal Kalman filter, a registration of simple geometric objects, or an ellipsoids' decomposition, we have proposed two real-time approaches, providing a mean error of 6%. Thanks to the approach robustness, it allows the simultaneous monitoring of several people even in contacts. The results obtained open up prospects for a transfer to home applications
Article
Full-text available
Tracking and modeling people from video sequences has become an increasingly important research topic, with applications including animation, surveillance, and sports medicine. In this paper, we propose a model-based 3-D approach to recovering both body shape and motion. It takes advantage of a sophisticated animation model to achieve both robustness and realism. Stereo sequences of people in motion serve as input to our system. From these, we extract a 2-D description of the scene and, optionally, silhouette edges. We propose an integrated framework to fit the model and to track the person's motion. The environment does not have to be engineered. We recover not only the motion but also a full animation model closely resembling the subject. We present results of our system on real sequences and we show the generic model adjusting to the person and following various kinds of motion.
Conference Paper
Full-text available
Vision-based human motion sensing has a strong merit that it does not impose any physical restrictions on humans, which provides a natural way of measuring human motion. However, its real-time processing is not easy to realize, because a human body has a high degrees of freedom, whose vision-based analysis is not simple and is usually time consuming. Here, we have developed a method in which human postures are analyzed from a limited number of visual cues. It is a combination of numerical analysis of inverse kinematics and visual search. Our method is based on a general framework of inverse kinematics, and, therefore, we can use relatively complex human figure model, which can generates natural human motion. In our experimental studies, we show that our implemented system works in real-time on a PC-cluster.
Conference Paper
Full-text available
Vision-based human motion sensing has a strong merit that it does not impose any physical restrictions on humans, which provides a natural way of measuring human motion. However, its real-time processing is not easy to realize, because a human body has a high degrees of freedom, whose vision-based analysis is not simple and is usually time consuming. Here, we have developed a method in which human postures are analyzed from a limited number of visual cues. It is a combination of numerical analysis of inverse kinematics and visual search. Our method is based on a general framework of inverse kinematics, and, therefore, we can use relatively complex human figure model, which can generate natural human motion. In our experimental studies, we show that our implemented system works in real-time on a PC-cluster.
Conference Paper
Full-text available
Shape-from-silhouette (SFS), also known as visual hull (VH) construction, is a popular 3D reconstruction method, which estimates the shape of an object from multiple silhouette images. The original SFS formulation assumes that the entire silhouette images are captured either at the same time or while the object is static. This assumption is violated when the object moves or changes shape. Hence the use of SFS with moving objects has been restricted to treating each time instant sequentially and independently. Recently we have successfully extended the traditional SFS formulation to refine the shape of a rigidly moving object over time. We further extend SFS to apply to dynamic articulated objects. Given silhouettes of a moving articulated object, the process of recovering the shape and motion requires two steps: (1) correctly segmenting (points on the boundary of) the silhouettes to each articulated part of the object, (2) estimating the motion of each individual part using the segmented silhouette. In this paper, we propose an iterative algorithm to solve this simultaneous assignment and alignment problem. Once we have estimated the shape and motion of each part of the object, the articulation points between each pair of rigid parts are obtained by solving a simple motion constraint between the connected parts. To validate our algorithm, we first apply it to segment the different body parts and estimate the joint positions of a person. The acquired kinematic (shape and joint) information is then used to track the motion of the person in new video sequences.
Article
In this paper we describe a system for visual tracking of the limbs of a human body. The system uses a 3D computer graphics model of the human figure and op-timizes its parameters to fit the person's silhouette in one or more camera views. The system incorporates joint angle constraints and a novel method for dealing with ambiguous edges. The system operates in real-time on multiple views and has been evaluated on the CMU MoBo corpus with ground truth data and an au-tomatically evaluated performance metric.
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body pans in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. For the main regression, we evaluate both regularized least squares and relevance vector machine (RVM) regressors over both linear and kernel bases. The RVM's provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. For realism and good generalization with respect to viewpoints, we train the regressors on images resynthesized from real human motion capture data, and test it both quantitatively on similar independent test data, and qualitatively on a real image sequence. Mean angular errors of 6-7 degrees are obtained - a factor of 3 better than the current state of the art for the much simpler upper body problem.
Conference Paper
This paper describes an active camera system for high-speed object tracking. Under ordinary illumination, such as fluorescent lights, our system is capable of tracking an object in a typical indoor environment that contains a cluttered background. In order to accomplish this task, we developed a high-sensitivity high-frame-rate visual sensor system and a real-time object tracking algorithm that makes use of our previously proposed motion estimation technique, which is capable of robustly estimating optical flow in a high-frame-rate image sequence. The experimental results for our active camera system show the effectiveness and robustness of our visual sensor system and tracking algorithm.
The main challenge in articulated body motion tracking is the large number of degrees of freedom (around 30) to be recovered. Search algorithms, either deterministic or stochastic, that search such a space without constraint, fall foul of exponential computational complexity. One approach is to introduce constraints: either labelling using markers or colour coding, prior assumptions about motion trajectories or view restrictions. Another is to relax constraints arising from articulation, and track limbs as if their motions were independent. In contrast, we aim for general tracking without special preparation of objects or restrictive assumptions. The principal contribution of the paper is the development of a modified particle filter for search in high dimensional configuration spaces. It uses a continuation principle based on annealing to introduce the influence of narrow peaks in the fitness function, gradually. The new algorithm, termed annealed particle filtering, is shown to be capable of recovering full articulated body motion efficiently
Conference Paper
The mapping between 3D body poses and 2D shadows is fundamentally many-to-many and defeats regression methods, even with windowed context. We show how to learn a function between paths in the two systems, resolving ambiguities by integrating information over the entire length of a sequence. The basis of this function is a configural and dynamical manifold that summarizes the target system's behaviour. This manifold can be modeled from data with a hidden Markov model having special topological properties that we obtain via entropy minimization. Inference is then a matter of solving for the geodesic on the manifold that best explains the evidence in the cue sequence. We give a closed-form maximum a posteriori solution for geodesics through the learned density space, thereby obtaining optimal paths over the dynamical manifold. These methods give a completely general way to perform inference over time-series; in vision they support analysis, recognition, classification and synthesis of behaviours in linear time. We demonstrate with a prototype that infers 3D from monocular monochromatic sequences (e.g., back-subtractions), without using any articulatory body model. The framework readily accommodates multiple cameras and other sources of evidence such as optical flow or feature tracking
Conference Paper
We propose a method to estimate the motion of a person filmed by two or more fixed cameras. The novelty of our technique is its ability to cope with fast movements, self-occlusions and noisy images. Our algorithms are based on the latest works on calibration and image segmentation developed in our lab. We compare the projections of a 3D model of a person on the images to the detected silhouettes of the person, and by creating forces that will move the 3D model towards the final estimation of the real pose. We developed a fast algorithm that computes the motion of the articulated 3D model. We show that our results are good, even if the cameras are not synchronized