We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.
我们介绍了 Turbo3D,这是一种超高速文本到3D生成系统,能够在不到一秒的时间内生成高质量的高斯点云资产。Turbo3D采用快速的4步4视角扩散生成器和高效的前馈式高斯重构器,两者均在潜空间中运行。4步4视角生成器是通过一种新颖的双教师(Dual-Teacher)方法蒸馏的学生模型,该方法鼓励学生从多视角教师中学习视角一致性,并从单视角教师中学习照片真实感。通过将高斯重构器的输入从像素空间转移到潜空间,我们消除了额外的图像解码时间,并将变换器序列长度减半,从而实现了最高效率。与先前的基准方法相比,我们的方法在运行时间大幅缩短的同时,生成了更优质的3D结果。