Gaussian Splatting has enabled real-time 3D human avatars with unprecedented levels of visual quality. While previous methods require a desktop GPU for real-time inference of a single avatar, we aim to squeeze multiple Gaussian avatars onto a portable virtual reality headset with real-time drivable inference. We begin by training a previous work, Animatable Gaussians, on a high quality dataset captured with 512 cameras. The Gaussians are animated by controlling base set of Gaussians with linear blend skinning (LBS) motion and then further adjusting the Gaussians with a neural network decoder to correct their appearance. When deploying the model on a Meta Quest 3 VR headset, we find two major computational bottlenecks: the decoder and the rendering. To accelerate the decoder, we train the Gaussians in UV-space instead of pixel-space, and we distill the decoder to a single neural network layer. Further, we discover that neighborhoods of Gaussians can share a single corrective from the decoder, which provides an additional speedup. To accelerate the rendering, we develop a custom pipeline in Vulkan that runs on the mobile GPU. Putting it all together, we run 3 Gaussian avatars concurrently at 72 FPS on a VR headset.
高斯点云技术(Gaussian Splatting)已使实时三维人像生成达到了前所未有的视觉质量水平。虽然现有方法需要桌面级 GPU 才能实现单个虚拟人像的实时推理,我们的目标是将多个高斯虚拟人像压缩至便携式虚拟现实(VR)头显上,并实现实时驱动推理。 我们首先在一个由 512 台相机捕获的高质量数据集上训练现有的 可动画高斯(Animatable Gaussians) 模型。通过线性混合蒙皮(LBS)运动控制一组基础高斯,再通过神经网络解码器调整外观实现高斯动画化。在将模型部署到 Meta Quest 3 VR 头显 时,我们发现解码器和渲染是两个主要的计算瓶颈。 为加速解码器,我们在 UV 空间而非像素空间中训练高斯,并将解码器蒸馏为一个单层神经网络。此外,我们发现,高斯点云的邻域可以共享一个修正值,从而进一步提升速度。在渲染方面,我们开发了一条基于 Vulkan 的自定义管线,以在移动 GPU 上高效运行。 综合以上改进,我们成功在 VR 头显上以 72 FPS 同时运行 3 个高斯虚拟人像,显著提高了性能,为多虚拟人像实时驱动的便携式 VR 应用提供了可行的解决方案。