Figure 3 - uploaded by Lars Carius
Content may be subject to copyright.
Test set sequence from the ablation study. Each row represents a video sequence with 6 input frames (conditioning) and 6 frames predicted by the network. From top to bottom: ground truth, RGB→RGB, RGB-D→RGB, RGB→RGB-D, RGB-D→RGB-D

Test set sequence from the ablation study. Each row represents a video sequence with 6 input frames (conditioning) and 6 frames predicted by the network. From top to bottom: ground truth, RGB→RGB, RGB-D→RGB, RGB→RGB-D, RGB-D→RGB-D

Source publication
Preprint
Full-text available
We propose multiple methods for improving state-of-the-art GAN-based video synthesis approaches. We show that GANs using 3D-convolutions for video generation can easily be extended to predicting coherent depth maps alongside RGB frames, but results indicate that RGB accuracy does not improve if depth is available. We further propose critic-correcti...