Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.63 KB

2410.22128.md

File metadata and controls

5 lines (3 loc) · 2.63 KB

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

We consider the problem of novel view synthesis from unposed images in a single feed-forward. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS, where we further extend it to offer a practical solution that relaxes common assumptions such as dense image views, accurate camera poses, and substantial image overlaps. We achieve this through identifying and addressing unique challenges arising from the use of pixel-aligned 3DGS: misaligned 3D Gaussians across different views induce noisy or sparse gradients that destabilize training and hinder convergence, especially when above assumptions are not met. To mitigate this, we employ pre-trained monocular depth estimation and visual correspondence models to achieve coarse alignments of 3D Gaussians. We then introduce lightweight, learnable modules to refine depth and pose estimates from the coarse alignments, improving the quality of 3D reconstruction and novel view synthesis. Furthermore, the refined estimates are leveraged to estimate geometry confidence scores, which assess the reliability of 3D Gaussian centers and condition the prediction of Gaussian parameters accordingly. Extensive evaluations on large-scale real-world datasets demonstrate that PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.

我们研究单次前向传递中从未定姿图像生成新视角的问题。我们的框架利用了3DGS在速度、可扩展性以及高质量三维重建和视角合成方面的优势,进一步扩展其应用,以提供一种放宽常见假设(如密集的图像视角、准确的相机姿态以及显著的图像重叠)的实用解决方案。我们通过识别和解决因使用像素对齐的3DGS而引发的独特挑战来实现这一目标:不同视角间的三维高斯对齐不良会导致噪声或稀疏的梯度,破坏训练稳定性并阻碍收敛,尤其在上述假设不成立的情况下。为此,我们采用预训练的单目深度估计和视觉对应模型,实现三维高斯的粗对齐。随后,我们引入轻量且可学习的模块,以从粗对齐中精化深度和姿态估计,从而提升三维重建和新视角合成的质量。此外,精化后的估计被用于计算几何置信分数,以评估三维高斯中心的可靠性,并相应地调节高斯参数的预测。对大规模真实数据集的广泛评估表明,PF3plat在所有基准测试中均设立了新的性能标杆,并通过全面的消融研究验证了我们的设计选择。