Fig 21 - uploaded by Ying He
Content may be subject to copyright.
Test data used in the user study. Left: the pose pictures shown to the user. Middle: the results of our sketching interface. Right: the results of Poser Pro 2010.
Source publication
This paper presents an intuitive sketching interface that allows the user to interactively place a 3D human character in a sitting position on a chair. Within our framework, the user sketches the target pose as a 2D stick figure and attaches the selected joints to the environment (for example, the feet on the ground) with a pin tool. As reconstruct...
Context in source publication
Context 1
... practice, this strategy is very effective to avoid most of the collisions. However, it cannot guarantee the final result is completely collision-free. Although we introduced attach constraints in the objective function, the optimization can only place joints close to the attached positions, rather than accurately attach these joints the specified positions. Thus, we further conducted a rigid body dynamic simulation [25] on the reconstructed pose. For each skeleton chain that contain attaching joints, we performed a rigid body dynamic simulation with fixed root joint. The simulation ran until all attaching joints contacted the specified attach surfaces or until a certain time step number was reached. Here, we assumed that all the attach points would act as support points. Thus, this system cannot deal with these non-support attachment cases(e.g., attaching the hand to a wall, etc). It also does not work well if the attachment locates on a very tiny object as in the case the attaching joint may fall outside the attaching region during simulation. In these cases, the user can manually drag the attaching joint to the target position. We developed a prototype system in C++ on a worksta- tion with an Intel Xeon 2.67-GHz CPU, NVIDIA Quadro FX 580 GPU. We conducted experiments in various virtual environments. See Figures 1, 2, 9, 10, 14 and 16 for some examples of constructed 3D poses. As shown in Figure 17, a speedup of 3 − 5 can be obtained by using the proposed GPU solver, compared to the original CPU solver [2]. We also evaluated the reconstruction accuracy with respect to the population size. We run 50 reconstructions for each population size and computed the average fitness value and the standard deviation. As shown in Figure 18, the bigger the population size, the smaller the fitness value and the standard deviation. Based on the above analysis, we set the default value of main population size to 1024 , which leads to accurate results at interactive speed( 1 − 2 seconds to generate a pose). To investigate the effect of collision handling strategy on the convergence of the genetic solver, we conduct a test to reconstruct poses from the input in Figure 2 wth/without collision handling. As shown in Figure 19, the collision handling strategy works pretty well as its convergence rate is very similar to the one without collision handling. We also compare the reconstructed result with the ground truth data. We firstly synthesize a 2D stick figure in the screen space from a posed character with 170 cm(see Figure 20(a)). Then we apply our algorithm on the synthesized stick figure to generate a reconstructed 3D pose (the pink one in Figure 20(b)and(c)). The reconstruction error is 7.8 cm, which is computed as the average Euclidean distances of 3D joints between the ground truth data and the reconstructed pose. We conducted a user study to evaluate the benefits of our sketching interface for general user. Our system is compared against Poser Pro 2010 , a professional 3D figure design & animation software. To pose a character with Poser Pro 2010, users can choose to switch on/off the inverse kinematics function. With inverse kinematics, users only need to edit the end joints, the intermediate ones will be updated automatically. Without inverse kinematics, users have to edit each joint one by one. We designed two experiments (with increasing difficulty) in the user study to evaluate how the interfaces can help user to design a 3D pose in a complex environment. The test data set of each experiment includes a photo showing the desired pose, the 3D virtual environment in both Poser Pro 2010 and our system (see Figure 21). Participants. Twenty participants were recruited in our user study: twelve males and eight females, aged between 20 and 35. Among them, four are professional artists, four have no experiences with 3D software, the others have passing knowledge or certain experiences in 3D design. Note that the familiarity with the tasks may affect the user study result. To minimize such a potential discrepancy, we divided the participants into two equally-sized groups, the professional artists and beginners were split equally into the two groups. The first group (G1) was asked to complete the tasks using the proposed sketching interface followed by Poser Pro 2010, while the other group (G2) did exactly the same tasks but in the reverse order. There are two stages in the user study. In the first stage (for G1), we first showed our proposed interface to the participants and gave them 5 minutes to try the operations in the interface. Then they were asked to complete the two design tasks (two experiments) with it. After that, we moved on to the second stage and introduced the interface of Poser Pro 2010 to them. Again, they had 5 minutes to get familiar with the interface. After that, they were asked to do the (same) two experiments with Poser Pro 2010 this time. The second group did exactly the same tasks but in a reverse order when using the two systems. During the course of the user study, the time taken by each participant to complete a task was measured. Experiment #1. The first experiment is a character sitting on a stool and stretching his legs. It aims to evaluate how the 2D sketching interface can help users to quickly design a 3D pose. Here, we choose to use a simple pose with simple environment, so as to eliminate the inter- ferences of other factors such as occlusion, collision etc. With Pose Pro 2010, the users had to frequently change the viewpoint to find the appropriate viewing direction and specify the joint locations. In sharp contrast, our sketching interface allows the users to simply sketch a stick figure without changing the view direction, pin the feet on the ground and then obtain the 3D pose imme- diately. As shown in Figure 22, the participants saved up to 53.4% time by using our sketching interface. We conduct the paired t-test to investigate the significance of the difference between the averaged time costs of the two interface. We have t = 5 . 7771 , df = 19 and p < 0 . 0005 , thus we conclude that our result is significant beyond the 0 . 0005 level. Experiment #2. The second experiment is slightly difficult than the first one in that the legs are occluded by a tea table placed in front of the character. This experiment aims to simulate a complex environment, which is quite common in real-world design applications. Thus the pose and the environment shown to the user are much complex than the previous one, frequent switch of camera will be involved. Due to the occlusions, the Pose Pro 2010 users have to carefully align the camera to position the joints, thus, it is very tedious to position the legs (see the supplementary video). Besides, using a 2D input device such as mouse for such kind of task is also a challenge. The consideration of collision with the environment makes the task even more difficult. The users of our system, however, can complete this task in the same way as the first experiment, since they can sketch the 2D stick figure on the screen space without caring the occlusion issue at all. The time statistics in Figure 22 justify that our sketching interface is more efficient than the conventional IK based interface saving 58.7% time. According to the paired t-test( t = 8 . 9183 , df = 19 , p < 0 . 0005 ), the result is significant beyond 0 . 0005 level. Note that the sitting poses and the environment configurations in the two experiments are carefully designed for the comparison of the two interfaces. They maybe not sufficient to cover the whole range of sitting pose, however, they are typical and very common in real ...
Citations
... ). , . Lin [203] , . ...
... Pose generation for bodies in 3D scenes: Early methods use either contact annotations [35] or detections [27] on 3D objects, and fit 3D skeletons to these. Other methods use physics simulation to reason about contacts and sitting comfort [24,33,66]. ...
Generating digital humans that move realistically has many applications and is widely studied, but existing methods focus on the major limbs of the body, ignoring the hands and head. Hands have been separately studied but the focus has been on generating realistic static grasps of objects. To synthesize virtual characters that interact with the world, we need to generate full-body motions and realistic hand grasps simultaneously. Both sub-problems are challenging on their own and, together, the state-space of poses is significantly larger, the scales of hand and body motions differ, and the whole-body posture and the hand grasp must agree, satisfy physical constraints, and be plausible. Additionally, the head is involved because the avatar must look at the object to interact with it. For the first time, we address the problem of generating full-body, hand and head motions of an avatar grasping an unknown object. As input, our method, called GOAL, takes a 3D object, its position, and a starting 3D body pose and shape. GOAL outputs a sequence of whole-body poses using two novel networks. First, GNet generates a goal whole-body grasp with a realistic body, head, arm, and hand pose, as well as hand-object contact. Second, MNet generates the motion between the starting and goal pose. This is challenging, as it requires the avatar to walk towards the object with foot-ground contact, orient the head towards it, reach out, and grasp it with a realistic hand pose and hand-object contact. To achieve this the networks exploit a representation that combines SMPL-X body parameters and 3D vertex offsets. We train and evaluate GOAL, both qualitatively and quantitatively, on the GRAB dataset. Results show that GOAL generalizes well to unseen objects, outperforming baselines. GOAL takes a step towards synthesizing realistic full-body object grasping.
... Lee et al. [34] generate novel scenes and motion by deformably stitching "motion patches", comprised of scene patches and the skeletal motion in them. Lin et al. [37] generate 3D skeletons sitting on 3D chairs, by manually drawing 2D skeletons and fitting 3D skeletons that satisfy collision and balance constraints. Kim et al. [30] automate this, by detecting sparse contacts on a 3D ob-ject mesh and fitting a 3D skeleton to contacts while avoiding penetrations. ...
Humans live within a 3D space and constantly interact with it to perform tasks. Such interactions involve physical contact between surfaces that is semantically meaningful. Our goal is to learn how humans interact with scenes and leverage this to enable virtual characters to do the same. To that end, we introduce a novel Human-Scene Interaction (HSI) model that encodes proximal relationships, called POSA for "Pose with prOximitieS and contActs". The representation of interaction is body-centric, which enables it to generalize to new scenes. Specifically, POSA augments the SMPL-X parametric human body model such that, for every mesh vertex, it encodes (a) the contact probability with the scene surface and (b) the corresponding semantic scene label. We learn POSA with a VAE conditioned on the SMPL-X vertices, and train on the PROX dataset, which contains SMPL-X meshes of people interacting with 3D scenes, and the corresponding scene semantics from the PROX-E dataset. We demonstrate the value of POSA with two applications. First, we automatically place 3D scans of people in scenes. We use a SMPL-X model fit to the scan as a proxy and then find its most likely placement in 3D. POSA provides an effective representation to search for "affordances" in the scene that match the likely contact relationships for that pose. We perform a perceptual study that shows significant improvement over the state of the art on this task. Second, we show that POSA's learned representation of body-scene interaction supports monocular human pose estimation that is consistent with a 3D scene, improving on the state of the art. Our model and code will be available for research purposes at https://posa.is.tue.mpg.de.
... Computer animated scenes featuring virtual characters are a key element in a number of application fields, from the production of movies and video games to product and service design, education, training, advertising, cultural heritage, etc. (DiLorenzo, 2015;Lin, Igarashi, Mitani, Liao, & He, 2012). Creating 3D animations is a long and expensive process, which requires significant expertise (Chen, Izadi, & Fitzgibbon, 2012). ...
Virtual character animation is receiving an ever-growing attention by researchers, who proposed already many tools with the aim to improve the effectiveness of the production process. In particular, significant efforts are devoted to create animation systems suited also to non-skilled users, in order to let them benefit from a powerful communication instrument that can improve information sharing in many contexts like product design, education, marketing, etc. Apart from methods based on the traditional Windows-Icons-Menus-Pointer (WIMP) paradigms, solutions devised so far leverage approaches based on motion capture/retargeting (the so-called performance-based approaches), on non-conventional interfaces (voice inputs, sketches, tangible props, etc.), or on natural language processing (NLP) over text descriptions (e.g., to automatically trigger actions from a library). Each approach has its drawbacks, though. Performance-based methods are difficult to use for creating non-ordinary movements (flips, handstands, etc.); natural interfaces are often used for rough posing, but results need to be later refined; automatic techniques still produce poorly realistic animations. To deal with the above limitations, we propose a multimodal animation system that combines performance- and NLP-based methods. The system recognizes natural commands (gestures, voice inputs) issued by the performer, extracts scene data from a text description and creates live animations in which pre-recorded character actions can be blended with performer’s motion to increase naturalness.
... A NIMATION of virtual characters is essential for a wide range of applications, from the production of movies and video games to the creation of virtual environments used in education, cultural heritage, product design, and social networking scenarios, to name a few [1], [2]. ...
Software for computer animation is generally characterized by a steep learning curve, due to the entanglement of both sophisticated techniques and interaction methods required to control 3D geometries. This paper proposes a tool designed to support computer animation production processes by leveraging the affordances offered by articulated tangible user interfaces and motion capture retargeting solutions. To this aim, orientations of an instrumented prop are recorded together with animator's motion in the 3D space and used to quickly pose characters in the virtual environment. High-level functionalities of the animation software are made accessible via a speech interface, thus letting the user control the animation pipeline via voice commands while focusing on his or her hands and body motion. The proposed solution exploits both off-the-shelf hardware components (like the Lego Mindstorms EV3 bricks and the Microsoft Kinect, used for building the tangible device and tracking animator's skeleton) and free open-source software (like the Blender animation tool), thus representing an interesting solution also for beginners approaching the world of digital animation for the first time. Experimental results in different usage scenarios show the benefits offered by the designed interaction strategy with respect to a mouse & keyboard-based interface both for expert and non-expert users.
... Multitouch interfaces have been shown to outperform the mouse when comparing posing time [1]. Drawing is another 2D approach for character posing, for example Lin et al. [2] used drawing as an interface that enabled poses to be aligned with objects in a virtual environment. However, drawing and multi-touch methods are still limited to 2D control, which do not provide a natural interface for character posing. ...
... Other studies have also shown that a mouse is not an optimal input device for character posing (e.g., [2], [1]). In Oshita's study [1], the average posing time was 61 sec using a mouse and 19 sec using the multitouch interface. ...
We present and evaluate a novel interface for 3D character posing. In a pre-study we characterized the durations of animation operations with animation professionals using standard animation interfaces. We found that posing is the most time consuming part in character animation. Posing using a mouse is a slow and tedious task that involves sequences of selecting on-screen control handles, and manipulating the handles to adjust character parameters, e.g., joint rotations and end effector positions. Thus, various 3D user interfaces have been proposed to make animating easier, but they typically provide less accuracy. We developed a novel interface combining a mouse with the Leap motion to provide 3D input. We compared the mouse with our novel interface in a task that involved matching a virtual character's pose to a reference pose. We found that the Leap as 3D gestural input device was preferred over mouse. The Leap drastically decreased the number of required operations and the task completion time, especially for novice users. Based on the results, we expect that gestural interfaces have a potential of becoming an integral part of character posing interfaces in the near future.
... Related methods that use similar ideas for different purposes include the work by Choi et al. [27] which uses drawn stick figures in order to query similar postures from a database of motions, and the work by Lin et al. [28] which computes 3D sitting poses for characters by sketching them on a 2D interface. The work of Milliez et al. [29] uses 2D motion brushes from a painting device to generate stylized hierarchical content and movement. ...
In this paper we present an intuitive tool suitable for 2D artists using touch-enabled pen tablets. An artist-oriented tool should be easy-to-use, real-time, versatile, and locally refinable. Our approach uses an interactive system for 3D character posing from 2D strokes. We employ a closed-form solution for the 2D strokes to 3D skeleton registration problem. We first construct an intermediate 2D stroke representation by extracting local features using meaningful heuristics. Then, we match 2D stroke segments to 3D bones. Finally, 3D bones are carefully realigned with the matched 2D stroke segments while enforcing important constraints such as bone rigidity and depth. Our technique is real-time and has a linear time complexity. It is versatile, as it works with any type of 2D stroke and 3D skeleton input. Finally, thanks to its coarse-to-fine design, it allows users to perform local refinements and thus keep full control over the final results. We demonstrate that our system is suitable for 2D artists using touch-enabled pen tablets by posing 3D characters with heterogeneous topologies (bipeds, quadrupeds, hands) in real-time.
... In the literature, specific input devices and interactive user interfaces (UIs) are research topics which aim at improving the task of authoring in 3D modeling and animation softwares [8]. Some researches propose devices that are either better suited to live animation recording (by capturing the motion and the dynamics of the user [3]) or to static 3D editing [14] (by allowing the user to perform complex translation and orientation tasks in a 3D space). Only few system are actually suited to both interaction schemes [10], and the possibility to switch seamlessly between the two during an editing session have not really been exploited. ...
This chapter presents an intuitive user interface based on a self-adaptive architecture. It uses a consumer-range 3D hand capture device that allows its users to interactively edit objects in 3D space. While running, the system monitors the user’s behaviors and performance in order to maintain an up-to-date user model. This model then drives the re-arrangement and reparameterization of a rule-based system that controls the interaction. A user study let us define the initial parameters of this self-adaptive system. This preliminary study was conducted in a 3D infographics and animation school on 15 students. The study was both qualitative and quantitative: the qualitative evaluation consisted of a SUMI evaluation questionnaire while the quantitative evaluation consisted of analysing manually annotated recordings of the subjects together with a fine-grained log of the interaction mechanics. We believe that the self-adaptive aspects of the system is well suited to the problematics of rehabilitation. This system could, from the beginning, adapt to both the user’s impairments and needs, then follow and adapt its interaction logic according to the user’s progress. Such a system would, for instance, enable a clinician or a therapist to design tailored rehabilitation activities accounting for the patient’s exact physical and physiological condition.
... Sketch is also used for facial expressions by Lau [19] and Seol [20]. In computer animation, sketch also has been successfully used in motion synthesis, motion retrieval and posture design [21][22][23][24][25][26][27]14]. ...
... Meanwhile, a set of constraints and assumptions are applied to return the most likely 3D postures to the user. Lin et al. [23] presented an intuitive sketch interface that allows the user to creating a 3D human character in a sitting position on a chair. They reduced the reconstruction solution space by considering the interaction between the character and environment and adding physics constraints. ...
Sketch-based human motion retrieval is a hot topic in computer animation in recent years. In this paper, we present a novel sketch-based human motion retrieval method via selected 2-dimensional (2D) Geometric Posture Descriptor (2GPD). Specially, we firstly propose a rich 2D pose feature call 2D Geometric Posture Descriptor (2GPD), which is effective in encoding the 2D posture similarity by exploiting the geometric relationships among different human body parts. Since the original 2GPD is of high dimension and redundant, a semi-supervised feature selection algorithm derived from Laplacian Score is then adopted to select the most discriminative feature component of 2GPD as feature representation, and we call it as selected 2GPD. Finally, a posture-by-posture motion retrieval algorithm is used to retrieve a motion sequence by sketching several key postures. Experimental results on CMU human motion database demonstrate the effectiveness of our proposed approach.
... Space complexity of their method is a quadratic in the size of the training data. Wei and Chai [32] and Lin et al. [19] require additional constraints, like the distance between joints or the ground plane to be known, to resolve ambiguity in pose estimation. Yoo et al. [33] and Choi et al. [8] propose a sketching interface for 3D pose estimation. ...