Preprint

PhysAnimator: Physics-Guided Generative Cartoon Animation

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Creating hand-drawn animation sequences is labor-intensive and demands professional expertise. We introduce PhysAnimator, a novel approach for generating physically plausible meanwhile anime-stylized animation from static anime illustrations. Our method seamlessly integrates physics-based simulations with data-driven generative models to produce dynamic and visually compelling animations. To capture the fluidity and exaggeration characteristic of anime, we perform image-space deformable body simulations on extracted mesh geometries. We enhance artistic control by introducing customizable energy strokes and incorporating rigging point support, enabling the creation of tailored animation effects such as wind interactions. Finally, we extract and warp sketches from the simulation sequence, generating a texture-agnostic representation, and employ a sketch-guided video diffusion model to synthesize high-quality animation frames. The resulting animations exhibit temporal consistency and visual plausibility, demonstrating the effectiveness of our method in creating dynamic anime-style animations.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nevertheless, direct application of text-to-video generation models often struggles to retain the visual identity of clipart images or generate cartoon-style motions, resulting in unsatisfactory animation outcomes. In this paper, we introduce AniClipart, a system that transforms static clipart images into high-quality motion sequences guided by text-to-video priors. To generate cartoon-style and smooth motion, we first define Bézier curves over keypoints of the clipart image as a form of motion regularization. We then align the motion trajectories of the keypoints with the provided text prompt by optimizing the Video Score Distillation Sampling (VSDS) loss, which encodes adequate knowledge of natural motion within a pretrained text-to-video diffusion model. With a differentiable As-Rigid-As-Possible shape deformation algorithm, our method can be end-to-end optimized while maintaining deformation rigidity. Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models, in terms of text-video alignment, visual identity preservation, and motion consistency. Furthermore, we showcase the versatility of AniClipart by adapting it to generate a broader array of animation formats, such as layered animation, which allows topological changes.
Conference Paper
Full-text available
Creating sketch animations using traditional tools requires special artistic skills, and is tedious even for trained professionals. To lower the barrier for creating sketch animations, we propose a new system, Live Sketch, which allows novice users to interactively bring static drawings to life by applying deformation-based animation effects that are extracted from video examples. Dynamic deformation is first extracted as a sparse set of moving control points from videos and then transferred to a static drawing. Our system addresses a few major technical challenges, such as motion extraction from video, video-to-sketch alignment, and many-to-one motion driven sketch animation. While each of the sub-problems could be difficult to solve fully automatically, we present reliable solutions by combining new computational algorithms with intuitive user interactions. Our pilot study shows that our system allows both users with or without animation skills to easily add dynamic deformation to static drawings.
Article
Full-text available
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions. TTUR has an individual learning rate for both the discriminator and the generator. Using the theory of stochastic approximation, we prove that the TTUR converges under mild assumptions to a stationary local Nash equilibrium. The convergence carries over to the popular Adam optimization, for which we prove that it follows the dynamics of a heavy ball with friction and thus prefers flat minima in the objective landscape. For the evaluation of the performance of GANs at image generation, we introduce the `Fr\'{e}chet Inception Distance'' (FID) which captures the similarity of generated images to real ones better than the Inception Score. In experiments, TTUR improves learning for DCGANs and Improved Wasserstein GANs (WGAN-GP) outperforming conventional GAN training on CelebA, CIFAR-10, SVHN, LSUN Bedrooms, and the One Billion Word Benchmark.
Article
Full-text available
Hand-drawn animation is a major art form and communication medium, but can be challenging to produce. We present a system to help people create frame-by-frame animations through manual sketches. We design our interface to be minimalistic: it contains only a canvas and a few controls. When users draw on the canvas, our system silently analyzes all past sketches and predicts what might be drawn in the future across spatial locations and temporal frames. The interface also offers suggestions to beautify existing drawings. Our system can reduce manual workload and improve output quality without compromising natural drawing flow and control: users can accept, ignore, or modify such predictions visualized on the canvas by simple gestures. Our key idea is to extend the local similarity method in [Xing et al. 2014], which handles only low-level spatial repetitions such as hatches within a single frame, to a global similarity that can capture high-level structures across multiple frames such as dynamic objects. We evaluate our system through a preliminary user study and confirm that it can enhance both users' objective performance and subjective satisfaction.
Article
Full-text available
We present a method for controlling the motions of active deformable characters. As an underlying principle, we require that all motions be driven by internal deformations. We achieve this by dynamically adapting rest shapes in order to induce deformations that, together with environment interactions, result in purposeful and physically-plausible motions. Rest shape adaptation is a powerful concept and we show that by restricting shapes to suitable subspaces, it is possible to explicitly control the motion styles of deformable characters. Our formulation is general and can be combined with arbitrary elastic models and locomotion controllers. We demonstrate the efficiency of our method by animating curve, shell, and solid-based characters whose motion repertoires range from simple hopping to complex walking behaviors.
Article
Full-text available
This paper provides a unified discussion of the Delaunay triangulation. Its geometric properties are reviewed and several applications are discussed. Two algorithms are presented for constructing the triangulation over a planar set ofN points. The first algorithm uses a divide-and-conquer approach. It runs inO(N logN) time, which is asymptotically optimal. The second algorithm is iterative and requiresO(N 2) time in the worst case. However, its average case performance is comparable to that of the first algorithm.
Article
We propose the first video diffusion framework for reference-based lineart video colorization. Unlike previous works that rely solely on image generative models to colorize lineart frame by frame, our approach leverages a large-scale pretrained video diffusion model to generate colorized animation videos. This approach leads to more temporally consistent results and is better equipped to handle large motions. Firstly, we introduce Sketch-guided ControlNet which provides additional control to finetune an image-to-video diffusion model for controllable video synthesis, enabling the generation of animation videos conditioned on lineart. We then propose Reference Attention to facilitate the transfer of colors from the reference frame to other frames containing fast and expansive motions. Finally, we present a novel scheme for sequential sampling, incorporating the Overlapped Blending Module and Prev-Reference Attention , to extend the video diffusion model beyond its original fixed-length limitation for long video colorization. Both qualitative and quantitative results demonstrate that our method significantly outperforms state-of-the-art techniques in terms of frame and video quality, as well as temporal consistency. Moreover, our method is capable of generating high-quality, long temporal-consistent animation videos with large motions, which is not achievable in previous works. Our code and model are available at https://luckyhzt.github.io/lvcd.
Article
The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this paper presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.
Conference Paper
We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained textto-image diffusion models. ControlNet locks the productionready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with “zero convolutions” (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, e.g., edges, depth, segmentation, human pose, etc., with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.
Article
3D visual computing data are often spatially sparse. To exploit such sparsity, people have developed hierarchical sparse data structures, such as multi-level sparse voxel grids, particles, and 3D hash tables. However, developing and using these high-performance sparse data structures is challenging, due to their intrinsic complexity and overhead. We propose Taichi, a new data-oriented programming language for efficiently authoring, accessing, and maintaining such data structures. The language offers a high-level, data structure-agnostic interface for writing computation code. The user independently specifies the data structure. We provide several elementary components with different sparsity properties that can be arbitrarily composed to create a wide range of multi-level sparse data structures. This decoupling of data structures from computation makes it easy to experiment with different data structures without changing computation code, and allows users to write computation as if they are working with a dense array. Our compiler then uses the semantics of the data structure and index analysis to automatically optimize for locality, remove redundant operations for coherent accesses, maintain sparsity and memory allocations, and generate efficient parallel and vectorized instructions for CPUs and GPUs. Our approach yields competitive performance on common computational kernels such as stencil applications, neighbor lookups, and particle scattering. We demonstrate our language by implementing simulation, rendering, and vision tasks including a material point method simulation, finite element analysis, a multigrid Poisson solver for pressure projection, volumetric path tracing, and 3D convolution on sparse grids. Our computation-data structure decoupling allows us to quickly experiment with different data arrangements, and to develop high-performance data structures tailored for specific computational tasks. With 1 ¹ 0 th as many lines of code, we achieve 4.55× higher performance on average, compared to hand-optimized reference implementations.
Conference Paper
We present an interactive tool to animate the visual elements of a static picture, based on simple sketch-based markup. While animated images enhance websites, infographics, logos, e-books, and social media, creating such animations from still pictures is difficult for novices and tedious for experts. Creating automatic tools is challenging due to ambiguities in object segmentation, relative depth ordering, and non-existent temporal information. With a few user drawn scribbles as input, our mixed initiative creative interface extracts repetitive texture elements in an image, and supports animating them. Our system also facilitates the creation of multiple layers to enhance depth cues in the animation. Finally, after analyzing the artwork during segmentation, several animation processes automatically generate kinetic textures that are spatio-temporally coherent with the source image. Our results, as well as feedback from our user evaluation, suggest that our system effectively allows illustrators and animators to add life to still images in a broad range of visual styles.
Conference Paper
When bringing animated characters to life, artists often augment the primary motion of a figure by adding secondary animation -- subtle movement of parts like hair, foliage or cloth that complements and emphasizes the primary motion. Traditionally, artists add secondary motion to animated illustrations only through arduous manual effort, and often eschew it entirely. Emerging ``live' performance applications allow both novices and experts to perform the primary motion of a character, but only a virtuoso performer could manage the degrees of freedom needed to specify both primary and secondary motion together. This paper introduces physically-inspired rigs that propagate the primary motion of layered, illustrated characters to produce plausible secondary motion. These composable elements are rigged and controlled via a small number of parameters to produce an expressive range of effects. Our approach supports a variety of the most common secondary effects, which we demonstrate with an assortment of characters of varying complexity.
Conference Paper
Dynamic effects such as waves, splashes, fire, smoke, and explosions are an integral part of stylized animations. However, such dynamics are challenging to produce, as manually sketching key-frames requires significant effort and artistic expertise while physical simulation tools lack sufficient expressiveness and user control. We present an interactive interface for designing these elemental dynamics for animated illustrations. Users draw with coarse-scale energy brushes which serve as control gestures to drive detailed flow particles which represent local velocity fields. These fields can convey both realistic and artistic effects based on user specification. This painting metaphor for creating elemental dynamics simplifies the process, providing artistic control, and preserves the fluidity of sketching. Our system is fast, stable, and intuitive. An initial user evaluation shows that even novice users with no prior animation experience can create intriguing dynamics using our system.
Article
Physics-based animation is often used to animate scenes containing destruction of near-rigid, man-made materials. For these applications, the most important visual features are plastic deformation and fracture. Methods based on continuum mechanics model these materials as elastoplastic, and must perform expensive elasticity computations even though elastic deformations are imperceptibly small for rigid materials. We introduce an example-based plasticity model based on linear blend skinning that allows artists to author simulation objects using familiar tools. Dynamics are computed using an unmodified rigid body simulator, making our method computationally efficient and easy to integrate into existing pipelines. We introduce a flexible technique for mapping impulses computed by the rigid body solver to local, example-based deformations. For completeness, our method also supports prescoring based fracture. We demonstrate the practicality of our method by animating a variety of destructive scenes.
Conference Paper
We present a sketching tool for crafting animated illustrations that contain the exaggerated dynamics of stylized 2D animations. The system provides a set of motion amplifiers which implement a set of established principles of 2D animation. These amplifiers break down a complex animation effect into independent, understandable chunks. Each amplifier imposes deformations to an underlying grid, which in turn updates the corresponding strokes. Users can combine these amplifiers at will when applying them to an existing animation, promoting rapid experimentation. By leveraging the freeform nature of sketching, our system allows users to rapidly sketch, record motion, explore exaggerated dynamics using the amplifiers, and fine-tune their animations. Practical results confirm that users with no prior experience in animation can produce expressive animated illustrations quickly and easily.
Article
Traditional methods for creating dynamic objects and characters from static drawings involve careful tweaking of animation curves and/or simulation parameters. Sprite sheets offer a more drawing-centric solution, but they do not encode timing information or the logic that determines how objects should transition between poses and cannot generalize outside the given drawings. We present an approach for creating dynamic sprites that leverages sprite sheets while addressing these limitations. In our system, artists create a drawing, deform it to specify a small number of example poses, and indicate which poses can be interpolated. To make the object move, we design a procedural simulation to navigate the pose manifold in response to external or user-controlled forces. Powerful artistic control is achieved by allowing the artist to specify both the pose manifold and how it is navigated, while physics is leveraged to provide timing and generality. We used our method to create sprites with a range of different dynamic properties. Copyright © 2014 John Wiley & Sons, Ltd.
Conference Paper
We provide a smooth extension of arbitrary isotropic hyperelastic energy density functions to inverted configurations. This extension is designed to improve robustness for elasticity simulations with extremely large deformations and is analogous to the extension given to the first Piola-Kirchoff stress in [ITF04]. We show that our energy-based approach is significantly more robust to large deformations than the first Piola-Kirchoff fix. Furthermore, we show that the robustness and stability of a hyperelastic model can be predicted from a characteristic contour, which we call its primary contour. The extension to inverted configurations is defined via extrapolation from a convex threshold surface that lies in the uninverted portion of the principal stretches space. The extended hyperelastic energy density yields continuous stress and unambiguous stress derivatives in all inverted configurations, unlike in [TSIF05]. We show that our invertible energy-density-based approach outperforms the popular hyperelastic corotated model, and we also show how to use the primary contour methodology to improve the robustness of this model to large deformations.
Article
Keyframe animation is a common technique to generate animations of deformable characters and other soft bodies. With spline interpolation, however, it can be difficult to achieve secondary motion effects such as plausible dynamics when there are thousands of degrees of freedom to animate. Physical methods can provide more realism with less user effort, but it is challenging to apply them to quickly create specific animations that closely follow prescribed animator goals. We present a fast space-time optimization method to author physically based deformable object simulations that conform to animator-specified keyframes. We demonstrate our method with FEM deformable objects and mass-spring systems. Our method minimizes an objective function that penalizes the sum of keyframe deviations plus the deviation of the trajectory from physics. With existing methods, such minimizations operate in high dimensions, are slow, memory consuming, and prone to local minima. We demonstrate that significant computational speedups and robustness improvements can be achieved if the optimization problem is properly solved in a low-dimensional space. Selecting a low-dimensional space so that the intent of the animator is accommodated, and that at the same time space-time optimization is convergent and fast, is difficult. We present a method that generates a quality low-dimensional space using the given keyframes. It is then possible to find quality solutions to difficult space-time optimization problems robustly and in a manner of minutes.
Stable video diffusion: Scaling latent video diffusion models to large datasets
  • Andreas Blattmann
  • Tim Dockhorn
  • Sumith Kulal
  • Daniel Mendelevitch
  • Maciej Kilian
  • Dominik Lorenz
  • Yam Levi
  • Zion English
  • Vikram Voleti
  • Adam Letts
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv:2311.15127, 2023. 2
Control-a-video: Controllable text-to-video generation with diffusion models
  • Weifeng Chen
  • Yatai Ji
  • Jie Wu
  • Hefeng Wu
  • Pan Xie
  • Jiashi Li
  • Xin Xia
  • Xuefeng Xiao
  • Liang Lin
Weifeng Chen, Yatai Ji, Jie Wu, Hefeng Wu, Pan Xie, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv:2305.13840, 2023. 2
Seine: Short-to-long video diffusion model for generative transition and prediction
  • Xinyuan Chen
  • Yaohui Wang
  • Lingjun Zhang
  • Shaobin Zhuang
  • Xin Ma
  • Jiashuo Yu
  • Yali Wang
  • Dahua Lin
  • Yu Qiao
  • Ziwei Liu
Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. Seine: Short-to-long video diffusion model for generative transition and prediction. In The Twelfth International Conference on Learning Representations, 2023. 3
V3d: Video diffusion models are effective 3d generators
  • Zilong Chen
  • Yikai Wang
  • Feng Wang
  • Zhengyi Wang
  • Huaping Liu
Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu. V3d: Video diffusion models are effective 3d generators. arXiv:2403.06738, 2024. 2
Diffusion models beat gans on image synthesis
  • Prafulla Dhariwal
  • Alexander Nichol
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780-8794, 2021. 2
Preserve your own correlation: A noise prior for video diffusion models
  • Seungjun Songwei Ge
  • Guilin Nah
  • Tyler Liu
  • Andrew Poon
  • Bryan Tao
  • David Catanzaro
  • Jia-Bin Jacobs
  • Ming-Yu Huang
  • Yogesh Liu
  • Balaji
Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, and Yogesh Balaji. Preserve your own correlation: A noise prior for video diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22930-22941, 2023. 2
Tokenflow: Consistent diffusion features for consistent video editing
  • Michal Geyer
  • Omer Bar-Tal
  • Shai Bagon
  • Tali Dekel
Michal Geyer, Omer Bar-Tal, Shai Bagon, and Tali Dekel. Tokenflow: Consistent diffusion features for consistent video editing. arxiv:2307.10373, 2023. 6
The Technique of Special Effects Animation
  • Joseph Gilland
Joseph Gilland. Elemental Magic, Volume 2: The Technique of Special Effects Animation. Routledge, 2012. 3, 5