Video prediction results on KTH (64 × 64), predicting 30 and 40 frames using models trained to predict k frames at a time. All models condition on 10 past frames on 256 test videos.

Video prediction results on KTH (64 × 64), predicting 30 and 40 frames using models trained to predict k frames at a time. All models condition on 10 past frames on 256 test videos.

Source publication
Preprint
Full-text available
Multi-step prediction models, such as diffusion and rectified flow models, have emerged as state-of-the-art solutions for generation tasks. However, these models exhibit higher latency in sampling new frames compared to single-step methods. This latency issue becomes a significant bottleneck when adapting such methods for video prediction tasks, gi...

Contexts in source publication

Context 1
... choice aligns with prior methodologies and allows a fair comparison. The results of our evaluation are summarized in Table 1. ...
Context 2
... Table 1, it is evident that our model achieves superior performance while requiring fewer training frames. Unlike other approaches that rely on a larger number of frames for both context and future predictions (e.g., 10 context frames plus k future frames), our model operates effectively with just 4 context frames and 1 future frame. ...
Context 3
... best results under each metric are marked in bold. The results, as presented in Table 1, confirm that our method delivers state-of-the-art performance relative to baseline models. Qualitative results of our CVF model on the KTH dataset are shown in Fig. 4. ...

Similar publications

Article
Full-text available
Unsupervised domain adaptation (UDA) is a technique for learning from a label-rich source domain and transferring the learned knowledge to an unlabeled target domain. Current researches on feature-based UDA methods usually utilize the pseudo labels to find new feature representations that can minimize the distribution difference between the two dom...