Figure 5 - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Content may be subject to copyright.
Figure represents qualitative results of our CVF model on the BAIR dataset. The number of context frames used in the above setting is two for both sequences. Every 6 th predicted future frame is shown in the figure.
Source publication
Multi-step prediction models, such as diffusion and rectified flow models, have emerged as state-of-the-art solutions for generation tasks. However, these models exhibit higher latency in sampling new frames compared to single-step methods. This latency issue becomes a significant bottleneck when adapting such methods for video prediction tasks, gi...
Context in source publication
Context 1
... indicated in Table 2, our method consistently outperforms baseline models. Qualitative results of our CVF model on the BAIR dataset can be seen in Fig. 5. Table 3 highlights the superior efficiency of the proposed CVF model compared to diffusion based baselines across key metrics. CVF has the fewest parameters and requires only 5 sampling steps per frame. This makes CVF highly efficient and practical for video prediction tasks, where speed and resource efficiency are ...
Similar publications
Unsupervised domain adaptation (UDA) is a technique for learning from a label-rich source domain and transferring the learned knowledge to an unlabeled target domain. Current researches on feature-based UDA methods usually utilize the pseudo labels to find new feature representations that can minimize the distribution difference between the two dom...