Visit our project page for more details.
Get the String Performance Dataset (SPD).
And if you want to delve into the code or reproduce the final MoCap results from the raw data, please check the following.
Download the raw data from the dataset. By downloading any piece of data, you will get the RGB videos from various shooting angles (format in .avi
), the performance audio (format in .wav
), and the info summary of the corresponding piece (format in .json
). The summary.json
include the metadata of the performance itself, the camara parameters, and the frame range of the MoCap results corresponding to the original video.
Clone this repo, and install the dependencies.
You can modify the arguments in the scripts to meet your requirements.
This process is implemented in the frame_extract_pipeline.py
which is called by the script_frame_extract.py
.
This process is implemented in the infer_pipeline.py
which is called by the script_infer_humanpose.py
.
model.pth
for pose estimator should be downloaded ahead for infer.py
Preferred pose estimator model is this.
This process is implemented in the TrackKeyPoints_pipeline.py
which is called by the script_track_key_points.py
.
Track the keypoints of instruments using TAPIR.
Use YOLOv8 for Bow Detection and you can download our pretrained model.
To fit the positions of the bow in the graphics furtherly, we use DeepLSD and deeplsd_md.tar as the checkpoint.
With the 2D keypoints obtained from above steps, we apply triangulation for getting the initial 3D pose.
This process is implemented in the triangulation_pipeline.py
which is called by the script_triangulation.py
.
This process is implemented in the contact_points_pipeline.py
which is called by the script_cp_detection.py
.
Put the raw audio file (format in .wav
) in the 'audio/wavs' directory.
Use CREPE for pitch detection to obtain the pitch curve. CREPE is a realiable pitch tracker with more than 99.9% accuracy with 25 cents as the threshold.
Based on the pitch curve and the Pitch-Finger model, we infer the real-world note-playing position and their changes during the performance.
HPE model is designed for obtaining the 6d rotation representation from 2D imagery. We apply hand estimation on the 2D imagery from various shooting angles before integrating these results. With the integrated rotation, we graft the hand pose onto the whole body skeleton.
HPE model is currently not an open-source algorithm in the near feature as it is now serving as commmercial use. You may use the EasyMocap toolbox to obtain the MANO parameters from monocular videos and convert the pose parameters to the 6D representation as its alternative.
This process is implemented in the integrate_handpose_pipeline.py
which is called by the script_integrate_ik.py
.
This process is implemented in the inverse_kinematic_pipeline.py
which is called by the script_integrate_ik.py
.
We use python=3.8 here. Actually python ~= 3.8 should be fine.
git clone https://github.com/Yitongishere/string_performance.git
cd string_performance
conda create -n string_performance python=3.8
conda activate string_performance
pip install -r requirements.txt
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.9.1+cpu torchvision==0.10.1+cpu torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
You could also follow the instructions on the PyTorch official site.
You need to follow the instruction on the MMCV official Installation Guide depending on the type of system, CUDA version, PyTorch version, and MMCV version(mmcv~=2.0.0 is preferred).
Our Example (Windows or linux, torch==1.9.1+cu111, mmcv=2.0.0)
pip install mmcv==2.0.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9/index.html
You may need to install either Tensorflow or Torch as well.
Tensorflow: Please refer to CREPE Documentation
Torch: Please refer to TORCHCREPE Documentation
Follow the instructions below or the official guide provided by google.
Note: For inferring by pytorch models of TAPIR (TAP-Net/BootsTAPIR), you are required to install pytorch>=2.1.0 or you will miss key(s) in state_dict of the checkpoints when loading them and get wrong outputs. If you meet this question, please refer to Pytorch <2.1.0 can't load the checkpoints correctly
pip install -r requirements_inference.txt
cd cello_kp_2d\tapnet
If you want to use the GPU/TPU version of Jax:
[Linux System, Recommended]
Install Jax referring to the jax manual.
[Windows System]
You may need to use the Wheel.
We use the jaxlib-0.3.22+cuda11.cudnn82-cp38-cp38-win_amd64.whl with the configuration of Windows 11, CUDA 11.1+CUDNN8.2(NVIDIA RTX3060), Python=3.8.
mkdir checkpoints
cd checkpoints
wget -P checkpoints https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy
Here is an example of the key points result.