Skip to content

Latest commit

 

History

History
75 lines (65 loc) · 3.2 KB

README.md

File metadata and controls

75 lines (65 loc) · 3.2 KB

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Mingqi Gao1,4,+, Jingnan Luo2,+, Jinyu Yang1,*, Jungong Han3,4, Feng Zheng1,2,*

1 Tapall.ai   2 Southern University of Science and Technology   3 University of Sheffield   4 University of Warwick

+ Equal Contributions, * Corresponding Authors

📃 Technical Report 🔖 Awesome Work List in Video Object Segmentation

Demo

📍 Installation

We test the code in the following environments, other versions may also be compatible: Python=3.9, PyTorch=1.10.1, CUDA=11.3

pip install -r requirements.txt
pip install 'git+https://github.com/facebookresearch/fvcore' 
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
cd models/ops
python setup.py build install
cd ../..

📍 Training

  1. Download MUTR's checkpoint from HERE (Swin-L, joint-training on Ref-COCO series and Ref-YouTube-VOS).
  2. Run following commands to fine-tune MUTR on MeViS:
python -m torch.distributed.launch \
    --nproc_per_node 1 \      # num of gpus during training
    --master_port 10010 \
    --use_env train.py \
    --with_box_refine \
    --binary \
    --dataset_file mevis \
    --epochs 2 \
    --lr_drop 1 \
    --resume [MUTR checkpoint] \
    --output_dir [output path] \
    --mevis_path [MeViS path] \
    --backbone swin_l_p4w7

Please note that different num of gpus lead to different scores (as discussed HERE).

📍 Inference

Our checkpoint is available on Google Drive.

python inference_mevis.py \
    --with_box_refine \
    --binary \
    --output_dir [output path] \
    --resume [checkpoint path] \
    --ngpu 1 \
    --batch_size 1 \
    --backbone swin_l_p4w7 \
    --mevis_path [MeViS path] \
    --split valid \
    --sub_video_len 30 

📖 Citation

If you find our solution useful for your research, please consider citing with this BibTeX:

@misc{gao20241st,
      title={1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation}, 
      author={Mingqi Gao and Jingnan Luo and Jinyu Yang and Jungong Han and Feng Zheng},
      year={2024},
      eprint={2406.07043},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🙌 Acknowledgement

The solution is based on MUTR and MeViS. Thanks for the authors for their efforts.