(* denotes equal contribution)
This repository contains the implementation for the paper VeGaS: Video Gaussian Splatting.
Abstract: Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data.
Follow the steps below to set up the project environment:
- CUDA-ready GPU with Compute Capability 7.0+
- CUDA toolkit 12 for PyTorch extensions (we used 12.4)
Clone the repository with its submodules to ensure all dependencies are included.
git clone https://github.com/gmum/VeGaS.git --recursive
cd VeGaS
Create and activate a Python virtual environment using Python 3.8.
python3.8 -m venv env
source env/bin/activate
Install the PyTorch framework and torchvision for deep learning tasks.
pip3 install torch torchvision
Install the necessary submodules for Gaussian rasterization and k-nearest neighbors.
pip3 install submodules/diff-gaussian-rasterization
pip3 install submodules/simple-knn
Install all other dependencies listed in the requirements.txt
file.
pip3 install -r requirements.txt
python3 train.py -s <dataset_dir> -m <output_dir>
Before training, your video needs to be converted to individual frames (0000.png, 0001.png, ...). The data directory needs to have a structure like this:
<data>
|---<original>
| |---0000.png
| |---0001.png
| |---...
|---<mirror>
--random_background
Randomizes background during training. Use it if you want to train VeGaS on a video with a transparent backround.--poly_degree <int>
Use to change polynomial degree of folded gaussians.--batch_size <int>
Batch size.
python3 render.py --model_path <model_dir> --interp <interp>
--model_path
Path to the model directory.--interp
Multiplier for the framerate during interpolation. Use1
for the original framerate (default),2
for doubling the framerate, etc.
he rendered video is saved to the <model_dir>/render
directory.
You can modify your render by manipulating Gaussians. Update the following function in render.py:
def modify_func(means3D: torch.Tensor, # num_gauss x 3, means3D[:,1] = 0
scales: torch.Tensor, # num_gauss x 3, scales[:,1] = eps
rotations: torch.Tensor # # num_gauss x 4, 3D quaternions of 2D rotations
time: float):
return means3D, scales, rotations
function in render.py
where
means3D
- positions of the gaussiansscales
- scales of the gaussiansrotations
- rotation quaternions of the gaussianstime
- timestamp of the frame (between 0 and 1)
The shape of means3D
and scales
has to stay the same.
python3 save_psedomesh.py --model_path <model_dir>
--model_path
Path to the model directory.
This script saves GaMeS mesh file (*.obj
) and point cloud file (*.ply
) for each frame of the video in the <model_dir>/pseudomesh
directory. These files can be edited in Blender or directly modified during rendering.
python render_video.py --model_path <output_dir>
--model_path
Path to the model directory.
The rendered video is saved to the <model_dir>/render
directory.
--bg_model <str>
Use to render other, a background model "behind" your main model. Useful if your main (foreground) model is transparent.
It is possible to edit your render by manipulating a GaMeS mesh. You can achieve that by editing
def modify_mesh(triangles: torch.Tensor, # num_gaussians x 3 x 3, triangles[:,:,1] = 0
time: float
):
return triangles
function in render_video.py
where
triangles
- pseudomesh, for each gaussian a triangle is defined with 3 points (each with 3 coordinates)time
- timestamp of the frame (between 0 and 1)
Our code was developed based on MiraGe and gaussian-splatting.
@Article{2024vegas,
author={Weronika Smolak-Dyżewska and Dawid Malarz and Kornel Howil and Jan Kaczmarczyk and Marcin Mazur and Przemysław Spurek},
title={VeGaS: Video Gaussian Splatting},
year={2024},
eprint={2411.11024},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.11024},
}