Shunyu Yao’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Figure 5: Example user editing results on Virtual KITTI. (a) We move a car closer to the camera, keeping the same texture. (b) We can synthesize the same car with different 3D poses. The same texture code is used for different poses. (c) We modify the appearance of the input red car using new texture codes. Note that its geometry and pose stay the same. We can also change the environment by editing the background texture codes. (d) We can inpaint occluded regions and remove objects.
3D-Aware Scene Manipulation via Inverse Graphics
  • Preprint
  • File available

August 2018

·

151 Reads

Shunyu Yao

·

·

·

[...]

·

Joshua B. Tenenbaum

We aim to obtain an interpretable, expressive and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous representations learned by neural networks are often uninterpretable, limited to a single object, or lack 3D knowledge. In this work, we address the above issues by integrating 3D modeling into a deep generative model. We adopt a differentiable shape renderer to decode geometrical object attributes into a shape, and a neural generator to decode learned latent codes to texture. The encoder is therefore forced to perform an inverse graphics task and transform a scene image into a structured representation with 3D attributes of objects and learned texture latent codes. The representation supports reconstruction and a variety of 3D-aware scene manipulation applications. The disentanglement of structure and texture in our representation allows us to rotate and move objects freely while maintaining consistent texture, as well as changing the object appearance without affecting their structures. We systematically evaluate our representation and demonstrate that our editing scheme is superior to 2D counterparts.

Download

Figure 5: Example user editing results on Virtual KITTI. (a) We move a car closer to the camera, keeping the same texture. (b) We can synthesize the same car with different 3D poses. The same texture code is used for different poses. (c) We modify the appearance of the input red car using new texture codes. Note that its geometry and pose stay the same. We can also change the environment by editing the background texture codes. (d) We can inpaint occluded regions and remove objects.
3D-aware scene manipulation via inverse graphics

January 2018

·

143 Reads

·

202 Citations

We aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous scene representations learned by neural networks are often uninterpretable, limited to a single object, or lacking 3D knowledge. In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model. Our scene encoder performs inverse graphics, translating a scene into a structured object-wise representation. Our decoder has two components: a differentiable shape renderer and a neural texture generator. The disentanglement of semantics, geometry, and appearance supports 3D-aware scene manipulation, e.g., rotating and moving objects freely while keeping the consistent shape and texture, and changing the object appearance without affecting its shape. Experiments demonstrate that our editing scheme based on 3D-SDN is superior to its 2D counterpart. ©2018 Poster presentation at the 32nd annual Conference on Neural Information Processing Systems (NIPS 2018), December 3-5, 2018, Montréal, Québec.

Citations (1)


... DCGAN demonstrated the ability to generate high-quality images and contributed to the robustness of GAN training. To address the inherent training difficulties and instability of GANs, different objectives were proposed ; Gulrajani et al. (2017); Mroueh et al. (2017a;b); Li et al. (2017), leading to more stable training and ultimately producing higher quality outputs. Self-Attention GANs (SAGAN) enhanced GANs' ability to capture global dependencies within images by integrating self-attention mechanisms Vaswani et al. (2023). ...

Reference:

GANs Conditioning Methods: A Survey
3D-aware scene manipulation via inverse graphics