If you find our code or paper helpful, please consider starring our repository and citing:
@misc{bringer2024mdmp,
title={MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty},
author={Leo Bringer and Joey Wilson and Kira Barton and Maani Ghaffari},
year={2024},
eprint={2410.03860},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.03860},
}
sudo apt update
sudo apt install ffmpeg
conda env create -f environment.yml
conda activate mdmp
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
We test our code on Python 3.7.13 and PyTorch 1.7.1
pip install -r requirements.txt
We test this installation on Python 3.10
pip install --upgrade --no-cache-dir gdown
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
bash prepare/download_recognition_models.sh
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
bash prepare/download_models.sh
mdmp_pretrained - it's a zip file, unzip and place them in ./save/
.
Demo with Skeletons
python -m sample.generate_w_gt --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 3 --num_repetitions 3
You may modify the arguments based on your preferences:
--device
id.--num_samples
to generate more samples conditionned on different inputs--num_repetitions
to generate more samples conditionned on the same inputs--model_path
to change the path if you have trained your own model and want to test it
Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion including the ground-truth motion for comparison.
It should look something like this:
Demo with Skeletons & Presence zones
python -m sample.generate_w_zones --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 3 --num_repetitions 3
You may modify the arguments based on your preferences:
--device
id.--num_samples
to generate more samples conditionned on different inputs--num_repetitions
to generate more samples conditionned on the same inputs--model_path
to change the path if you have trained your own model and want to test it
Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion including zones of presence around 'end-effector' joints to assess uncertainty.
It should look something like this:
Demo with SMPL Meshes (with Blender)
python -m sample.generate_for_meshes --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 3 --num_repetitions 3
You may also define:
--device
id.--num_samples
to generate more samples conditionned on different inputs--num_repetitions
to generate more samples conditionned on the same inputs--model_path
to change the path if you have trained your own model and want to test it
Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion with no ground-truth, no floor, nor zones of presence.
It should look something like this:
Frow now on if you want to render the SMPL mesh you should chose an .mp4
file that you would like to render, copy its relative path and use the following script to create SMPL parameters of that file:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in.obj
format.
Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.
YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/blender_requirements.txt
Run the following command to render SMPL using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --npy=YOUR_NPY_FOLDER --mode=video
You may also define:
--mode=video
: render mp4 video--mode=sequence
: render the whole motion in a png image.
Based on the mode you chose this script outputs:
sample##_rep##_smpl_params.mp4
orsample##_rep##_smpl_params.png
- a Video with the SMPL parameters rendered with Blender (along with a folder with the.obj
files associated to each single frame of the video) or an image summary of the sequence, which should look like this:
python -m train.train_mdmp --save_dir save/my_own_mdmp --dataset humanml
- Use
--diffusion_steps 50
to train a faster model with less diffusion steps. - Use
--device
to define GPU id. - Add
--train_platform_type {ClearmlPlatform, TensorboardPlatform}
to track results with either ClearML or Tensorboard. - Add
--use_gcn true
to try the GCN version - Change
--emb_motion_len
to a value lower than 50 if you want you model to be conditionned on shorter motion sequences - Add
--num_steps
to specificy the number of training steps and train more or less - Use
--batch_size
to change to a smaller batch size if your GPU memory gets in the way
Accuracy Study (MPJPE)
- Takes about 30mins (on a single GPU) for 3 repetitions per input to go over the entire test set (excluding motion sequences shorter than 3s).
- The output of this script will be printed in the terminal and correspond to the MPJPE at various time steps of the predicted motion (from 0.5 to 5.5s).
- The pre-trained model results should match the ones reported in the temporal chart of the paper (or sometimes lower).
python -m eval.eval_mpjpe --model_path ./save/mdmp_pretrained/model000500000.pt --num_repetitions 3
You may also define:
--device
id.--num_samples
to generate more samples conditionned on different inputs--num_repetitions
to generate more samples conditionned on the same inputs--model_path
to change the path if you have trained your own model and want to test it
Uncertainty Study (Sparsification Error)
- The output of this script will be saved in the folder and correspond to the Sparsification Plot.
- The pre-trained model results should approxiametly match the ones reported in the paper (or sometimes lower).
python -m eval.eval_spars --model_path ./save/mdmp_pretrained/model000500000.pt --num_samples 10 --num_repetitions 5
You may also define:
--device
id.--num_samples
to generate more samples conditionned on different inputs--num_repetitions
to generate more samples conditionned on the same inputs (usually results in a curve that aligns even closely to the Oracle)--model_path
to change the path if you have trained your own model and want to test it
The plot in the paper: (here we only assess 'Mode Divergence' which is the best index) :
Thanks to guided-diffusion, MDM, MoMask, TEMOS, ACTOR, HumanML3D, text-to-motion, and joints2smpl, our code is partially borrowing from them.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.