Figure - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Content may be subject to copyright.
Video prediction results on KTH (64 × 64), predicting 30 and 40 frames using models trained to predict k frames at a time. All models condition on 10 past frames on 256 test videos.
Source publication
Multi-step prediction models, such as diffusion and rectified flow models, have emerged as state-of-the-art solutions for generation tasks. However, these models exhibit higher latency in sampling new frames compared to single-step methods. This latency issue becomes a significant bottleneck when adapting such methods for video prediction tasks, gi...
Contexts in source publication
Context 1
... choice aligns with prior methodologies and allows a fair comparison. The results of our evaluation are summarized in Table 1. ...Context 2
... Table 1, it is evident that our model achieves superior performance while requiring fewer training frames. Unlike other approaches that rely on a larger number of frames for both context and future predictions (e.g., 10 context frames plus k future frames), our model operates effectively with just 4 context frames and 1 future frame. ...Similar publications
Unsupervised domain adaptation (UDA) is a technique for learning from a label-rich source domain and transferring the learned knowledge to an unlabeled target domain. Current researches on feature-based UDA methods usually utilize the pseudo labels to find new feature representations that can minimize the distribution difference between the two dom...