Figure 1 - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Content may be subject to copyright.
Fig. (a) represents a naive adaptation of the diffusion model for the video prediction task. Here, the sampling process always starts from a Gaussian distribution, and sampling steps are taken in the direction of conditional distribution given by X j+1 |X j . Here, X j+1 denotes frame at time j + 1. In contrast, Fig. (b) introduces our Continuous Video Flow (CVF) approach, which reimagines the problem by treating video not as a discrete sequence of frames but as a continuously evolving process. Instead of starting from a static Gaussian distribution for each sampling step, CVF models the underlying dynamics of the entire video, learning to predict changes smoothly over time. This continuous framework allows the model to better capture temporal coherence and evolution, leading to more accurate and fluid video predictions.
Source publication
Multi-step prediction models, such as diffusion and rectified flow models, have emerged as state-of-the-art solutions for generation tasks. However, these models exhibit higher latency in sampling new frames compared to single-step methods. This latency issue becomes a significant bottleneck when adapting such methods for video prediction tasks, gi...
Context in source publication
Context 1
... temporal discretization, while often overlooked in generative video modeling, presents significant challenges. Current approaches, such as diffusion-based models, generally focus on generating the next frame in a sequence based on a given context frame, neglecting intermediate moments between frames (e.g., predicting frames at T + 0.5 or T + 0.25 given a context frame at T ) as represented by Figure 1. In this work, we aim to address the limitations imposed by such discretization during the generative modeling of videos. ...
Similar publications
Unsupervised domain adaptation (UDA) is a technique for learning from a label-rich source domain and transferring the learned knowledge to an unlabeled target domain. Current researches on feature-based UDA methods usually utilize the pseudo labels to find new feature representations that can minimize the distribution difference between the two dom...