Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

3DV 2024

Yixin Chen*¹

Junfeng Ni*²⁺

Nan Jiang³⁺

Yaowei Zhang¹

Yixin Zhu³

Siyuan Huang¹

* indicates equal contribution ⁺ Work done during an internship at BIGAI
¹National Key Laboratory of General Artificial Intelligence, BIGAI ²Tsinghua University ³Peking University

Overview

arXiv

Code

Data

Abstract

Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which primarily focus on geometric shape recovery, overlooking object appearances and fine shape details. To address these challenges, we propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images. Our approach utilizes SSR, Single-view neural implicit Shape and Radiance field representations, leveraging explicit 3D shape supervision and volume rendering of color, depth, and surface normal images. To overcome shape-appearance ambiguity under partial observations, we introduce a two-stage learning curriculum that incorporates both 3D and 2D supervisions. A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model. This integration enables not only improved textured 3D object reconstruction by 27.7% and 11.6% on the 3D-FRONT and Pix3D datasets, respectively, but also supports the rendering of images from novel viewpoints. Beyond individual objects, our approach facilitates composing object-level representations into flexible scene representations, thereby enabling applications such as holistic scene understanding and 3D scene editing.

Method

Our approach utilizes neural implicit shape and radiance field representations, leveraging explicit 3D shape supervision and volume rendering of color, depth, and surface normal images. To overcome shape-appearance ambiguity under partial observations, we introduce a two-stage learning curriculum that incorporates both 3D and 2D supervisions. Please refer to our paper for more details or checkout our code for implementation.

Single-view Object Reconstruction

We compare our reconstruction with state-of-the-art models in single-view 3D reconstruction. Our model produces textured 3D objects with smoother surfaces and finer details compared to previous methods.

Input Image

Total3D

Im3D

InstPIFu

Ours

Ours_texture

Ground Truth

Comparison with Prior-guided Models

We compare our model with generative models that demonstrate potential zero-shot generalizability by leveraging 2D or 3D geometric priors learned from large-scale datasets. Please refer to our paper for a more in-depth discussion.

Input Image

Crop Object

Zero-1-to-3 GIF Image

Shap-E

Ours

Generalizable Scene Reconstruction

Scene reconstruction results on SUN RGB-D. Note that our model is only trained on FRONT3D. The results demonstrate that our method can reconstruct detailed object shapes and intricate textures in real images with cross-domain generalization ability.

Input Image

InstPIFu

Ours

Ours_texture

GIF

Novel-view Synthesis from Single Image

Here we show our model's rendering capabilities, to render the color, depth, and normal images through volume rendering from the single-view input image, even when the viewing angles change significantly.

Input Image

Reconstruction Mesh

Color

Depth

Normal

Scene Editing

We demonstrate our model's potential in representing scenes and enabling 3D scene editing applications.

Object Translation and Rotation

Input Image

3D Mesh Rendering

Volume Rendering

Object Composition

Input Image

3D Mesh Rendering

Volume Rendering

BibTeX

@inproceedings{chen2023ssr,
               title={Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture},
               author={Chen, Yixin and Ni, Junfeng and Jiang, Nan and Zhang, Yaowei and Zhu, Yixin and Huang, Siyuan},
               booktitle=ThreeDV,
               year={2024}
}

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

3DV 2024

Overview

Abstract

Method

Single-view Object Reconstruction

Comparison with Prior-guided Models

Generalizable Scene Reconstruction

Novel-view Synthesis from Single Image

Scene Editing

Object Translation and Rotation

Object Composition

Related Work

BibTeX