Skip to content

Code for robust visual pose estimation pipeline (end-to-end) for Spot with minimal input requirements.

License

Notifications You must be signed in to change notification settings

mizeller/Spot-Pose-Estimation

 
 

Repository files navigation

Robust Visual Pose Estimation for in the Wild Videos of Spot

Semester Thesis, ETH Zurich, Autumn Semester 2023

Set-up, training and testing custom pose estimation pipelines is non-trivial. It can be a tedious and time-consuming process. This repository aims to simplify this.

The main contributions can be summarized as follows:

  • a Docker container ready to run an extended version of OnePose++

  • OnePose++ extended with:

    • DeepSingleCameraCalibration for running inference on in-the-wild videos
    • CoTracker2 for pose estimation optimization, improving the pose tracking performance by leveraging temporal cues as well1.
  • A low-entry demo to help understand the whole pipeline and readily debug/test the code.

  • custom data for Spot & instructions on how you can create the synthetic data for your own use-case

Installation

Hardware

Having a CUDA-enabled GPU is a must. The code was tested on the following GPUs:

  • NVIDIA GeForce RTX 2080

with the following OS & driver versions:

DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
NVIDIA-SMI (Driver Versions) 470.223.02   
CUDA Version: 11.4
Docker Version: 24.0.7, build afdd53b

Code

Set up the code by cloning the repository, initializing the submodules and downloading the necessary models and demo data:

git clone [email protected]:mizeller/OnePose_ST.git
cd OnePose_ST
git submodule update --init --recursive
mkdir -p data weight 

The pre-trained models for OnePose++, LoFTR and CoTracker2 as well as the demo data can be found here. Place the model files in weight/ and the demo data in data/.

At this point, the project directory should look like this:

.
├── assets
...
├── data
│   └── spot_demo
├── submodules
│   ├── CoTracker
│   ├── DeepLM
│   └── LoFTR
└── weight
    ├── LoFTR_wsize9.ckpt 
    ├── OnePosePlus_model.ckpt
    └── cotracker2.pth

Docker

To set up the docker container either build it locally

docker build -t="mizeller/spot_pose_estimation:00" .

or pull a pre-built container from DockerHub:

docker pull mizeller/spot_pose_estimation:00

Next, the container needs to be run. Again, there are several options to do this.

In case you're using VSCode's devcontainer feature, simply press CTRL+SHIFT+P and select Rebuild and Reopen in Container. This will re-open the project in a docker container.

Alternatively, you can run the docker container directly from the terminal. The following command mounts the ${REPO_ROOT} in the container. Note that the shared memory size is set to 32GB, change it to your hardware if necessary.

REPO_ROOT=$(pwd)
docker run --gpus all --shm-size=32g -w /workspaces/OnePose_ST -v ${REPO_ROOT}:/workspaces/OnePose_ST -it mizeller/spot_pose_estimation:00

Demo: Training & Inference

To test the set up (training and inference), run the demo script from a terminal in the docker container: sh demo.sh. This will run the following steps:

  1. Parse the demo data
  2. Train the OnePose++ model for Spot
  3. Run inference on the demo data captured using my phone

The results will be saved in the temp/ directory.

FYI: There are also custom debug entry points for each step of the pipeline. Have a look at the .vscode/launch.json.

Training Data

TODO: add comments about synthetic data pipeline & clean up the other repo as well

Acknowledgement & License

This repository is essentially a fork of the original OnePose++ repository - for more details, have a look at the original source here. Thanks to the original authors for their great work!

This repository uses several submodules, please refer to the respective repositories for their licenses.

Credits

This project was developed as part of the Semester Thesis for my (Michel Zeller) MSc. Mechanical Engineering at ETH Zurich. The project was supervised by Dr. Hermann Blum (ETH, Computer Vision and Geometry Group) and Francesco Milano (ETH, Autonomous Systems Lab).

Footnotes

  1. Note: As of this writing, CoTracker2 is still a work-in-progress. The online tracker can only run on every 4th frame which does not suffice for optimizing the pose estimation. That's why we currently use CoTracker as a post-processing step to optimize the poses for a given sequence. The 'yet' in this reply by the authors suggests that this feauture will be added to CoTracker in the future. A possible initial implementation is on this feature branch. It has not been updated in a while...

About

Code for robust visual pose estimation pipeline (end-to-end) for Spot with minimal input requirements.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 96.4%
  • Python 3.6%