Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

3DV 2024


* indicates equal contribution    + Work done during an internship at BIGAI
1National Key Laboratory of General Artificial Intelligence, BIGAI    2Tsinghua University     3Peking University

Overview


Demo (coming soon)

Abstract

Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which primarily focus on geometric shape recovery, overlooking object appearances and fine shape details. To address these challenges, we propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images. Our approach utilizes SSR, Single-view neural implicit Shape and Radiance field representations, leveraging explicit 3D shape supervision and volume rendering of color, depth, and surface normal images. To overcome shape-appearance ambiguity under partial observations, we introduce a two-stage learning curriculum that incorporates both 3D and 2D supervisions. A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model. This integration enables not only improved textured 3D object reconstruction by 27.7% and 11.6% on the 3D-FRONT and Pix3D datasets, respectively, but also supports the rendering of images from novel viewpoints. Beyond individual objects, our approach facilitates composing object-level representations into flexible scene representations, thereby enabling applications such as holistic scene understanding and 3D scene editing.

Method

Our approach utilizes neural implicit shape and radiance field representations, leveraging explicit 3D shape supervision and volume rendering of color, depth, and surface normal images. To overcome shape-appearance ambiguity under partial observations, we introduce a two-stage learning curriculum that incorporates both 3D and 2D supervisions. Please refer to our paper for more details or checkout our code for implementation.

Single-view Object Reconstruction

We compare our reconstruction with state-of-the-art models in single-view 3D reconstruction. Our model produces textured 3D objects with smoother surfaces and finer details compared to previous methods.

Comparison with Prior-guided Models

We compare our model with generative models that demonstrate potential zero-shot generalizability by leveraging 2D or 3D geometric priors learned from large-scale datasets. Please refer to our paper for a more in-depth discussion.

GIF Image
GIF Image
GIF Image
GIF Image
GIF Image
GIF Image


Generalizable Scene Reconstruction

Scene reconstruction results on SUN RGB-D. Note that our model is only trained on FRONT3D. The results demonstrate that our method can reconstruct detailed object shapes and intricate textures in real images with cross-domain generalization ability.

GIF Image
GIF Image
GIF Image


Novel-view Synthesis from Single Image

Here we show our model's rendering capabilities, to render the color, depth, and normal images through volume rendering from the single-view input image, even when the viewing angles change significantly.

GIF Image
GIF Image
GIF Image
GIF Image
GIF Image
GIF Image
GIF Image
GIF Image
GIF Image

Scene Editing

We demonstrate our model's potential in representing scenes and enabling 3D scene editing applications.

Object Translation and Rotation

GIF Image GIF Image
GIF Image GIF Image
GIF Image GIF Image
GIF Image GIF Image








Object Composition

GIF Image GIF Image
GIF Image GIF Image GIF Image

Related Work

BibTeX

@inproceedings{chen2023ssr,
               title={Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture},
               author={Chen, Yixin and Ni, Junfeng and Jiang, Nan and Zhang, Yaowei and Zhu, Yixin and Huang, Siyuan},
               booktitle=ThreeDV,
               year={2024}
}