G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior

* Work done as an intern at BIGAI   Corresponding author   1Tsinghua University
2State Key Laboratory of General Artificial Intelligence, BIGAI   3Peking University
arXiv 2025
arXiv Code

TL;DR: G4Splat integrates accurate geometry guidance with generative prior to enhance 3D scene reconstruction, substantially improving both geometric fidelity and appearance quality in observed and unobserved regions.


Abstract

Despite recent advances in leveraging generative prior from pre-trained diffusion models for 3D scene reconstruction, existing methods still face two critical limitations. First, due to the lack of reliable geometric supervision, they struggle to produce high-quality reconstructions even in observed regions, let alone in unobserved areas. Second, they lack effective mechanisms to mitigate multi-view inconsistencies in the generated images, leading to severe shape-appearance ambiguities and degraded scene geometry. In this paper, we identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction. We first propose to leverage the prevalence of planar structures to derive accurate metric-scale depth maps, providing reliable supervision in both observed and unobserved regions. Furthermore, we incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models, resulting in accurate and consistent scene completion. Extensive experiments on Replica, ScanNet++, and DeepBlending show that our method consistently outperforms existing baselines in both geometry and appearance reconstruction, particularly for unobserved regions. Moreover, our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios with practical real-world applicability.


Method

For each training loop, we first extract global 3D planes from all training views and compute plane-aware depth maps. Subsequently, we construct a visibility grid from these depth maps, select plane-aware novel views, inpaint their invisible regions, and incorporate the completed views back into the training set.


Results

Comparison with Baselines (5-View Input)

RGB Mesh
Scene 1 Scene 2 Scene 3 Scene 4 Scene 5 Scene 6 Scene 7

Baseline (left) vs. Ours (right). Our method delivers more accurate geometry with fewer Gaussian floaters across both observed and unobserved regions. Try exploring different methods, scenes, and modes (RGB/Mesh) to see the difference!

Any-View Reconstruction

Scene 1
1 input view
Scene 2
1 input view
Scene 3
1 input view
Scene 4
5 input views
Scene 5
5 input views
Scene 6
5 input views
Scene 7
10 input views
Scene 8
10 input views

Our method shows strong generalization across diverse scenarios, including indoor and outdoor environments, unposed scenes, and even single-view inputs. Try exploring different scenes!

Dense-View Comparison with Baselines (383-View Input)



Our method significantly outperforms the baselines even with dense-view inputs, especially in regions with strong specularities and reflections.


Citation

@article{ni2025g4splat,
    title={G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior},
    author={Ni, Junfeng and Chen, Yixin and Yang, Zhifei and Liu, Yu and Lu, Ruijie and Zhu, Song-Chun and Huang, Siyuan},
    journal={arXiv preprint arXiv:2510.12099},
    year={2025}
}

The website template was borrowed from Michaël Gharbi and Ref-NeRF.