This paper presents DENSER, an efficient and effective approach leveraging 3D Gaussian splatting (3DGS) for the reconstruction of dynamic urban environments. While several methods for photorealistic scene representations, both implicitly using neural radiance fields (NeRF) and explicitly using 3DGS have shown promising results in scene reconstruction of relatively complex dynamic scenes, modeling the dynamic appearance of foreground objects tend to be challenging, limiting the applicability of these methods to capture subtleties and details of the scenes, especially far dynamic objects. To this end, we propose DENSER, a framework that significantly enhances the representation of dynamic objects and accurately models the appearance of dynamic objects in the driving scene. Instead of directly using Spherical Harmonics (SH) to model the appearance of dynamic objects, we introduce and integrate a new method aiming at dynamically estimating SH bases using wavelets, resulting in better representation of dynamic objects appearance in both space and time. Besides object appearance, DENSER enhances object shape representation through densification of its point cloud across multiple scene frames, resulting in faster convergence of model training. Extensive evaluations on KITTI dataset show that the proposed approach significantly outperforms state-of-the-art methods by a wide margin.
本文提出了DENSER,这是一种高效且有效的3D高斯投影(3DGS)方法,用于动态城市环境的重建。尽管隐式的神经辐射场(NeRF)和显式的3DGS在复杂动态场景的重建中都取得了有前景的成果,但对前景动态物体的外观建模仍然具有挑战性,限制了这些方法在捕捉场景细节和微妙之处,尤其是远处动态物体时的适用性。为此,我们提出了DENSER框架,显著增强了动态物体的表示能力,并准确建模了驾驶场景中动态物体的外观。 与直接使用球谐函数(Spherical Harmonics, SH)来建模动态物体外观不同,我们引入并整合了一种新的方法,利用小波动态估计SH基底,从而在空间和时间上更好地表示动态物体的外观。除了物体外观,DENSER还通过在多个场景帧中对点云进行致密化来提升物体形状的表示,从而加速模型训练的收敛速度。在KITTI数据集上的广泛评估表明,该方法相比于最先进的方法在性能上有显著提升。