1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Mingqi Gao^1,4,+, Jingnan Luo^2,+, Jinyu Yang^1,*, Jungong Han^3,4, Feng Zheng^1,2,*

¹ Tapall.ai ² Southern University of Science and Technology ³ University of Sheffield ⁴ University of Warwick

⁺ Equal Contributions, ^* Corresponding Authors

📃 Technical Report 🔖 Awesome Work List in Video Object Segmentation

📍 Installation

We test the code in the following environments, other versions may also be compatible: Python=3.9, PyTorch=1.10.1, CUDA=11.3

pip install -r requirements.txt
pip install 'git+https://github.com/facebookresearch/fvcore' 
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
cd models/ops
python setup.py build install
cd ../..

📍 Training

Download MUTR's checkpoint from HERE (Swin-L, joint-training on Ref-COCO series and Ref-YouTube-VOS).
Run following commands to fine-tune MUTR on MeViS:

python -m torch.distributed.launch \
    --nproc_per_node 1 \      # num of gpus during training
    --master_port 10010 \
    --use_env train.py \
    --with_box_refine \
    --binary \
    --dataset_file mevis \
    --epochs 2 \
    --lr_drop 1 \
    --resume [MUTR checkpoint] \
    --output_dir [output path] \
    --mevis_path [MeViS path] \
    --backbone swin_l_p4w7

Please note that different num of gpus lead to different scores (as discussed HERE).

📍 Inference

Our checkpoint is available on Google Drive.

python inference_mevis.py \
    --with_box_refine \
    --binary \
    --output_dir [output path] \
    --resume [checkpoint path] \
    --ngpu 1 \
    --batch_size 1 \
    --backbone swin_l_p4w7 \
    --mevis_path [MeViS path] \
    --split valid \
    --sub_video_len 30

📖 Citation

If you find our solution useful for your research, please consider citing with this BibTeX:

@misc{gao20241st,
      title={1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation}, 
      author={Mingqi Gao and Jingnan Luo and Jinyu Yang and Jungong Han and Feng Zheng},
      year={2024},
      eprint={2406.07043},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

🙌 Acknowledgement

The solution is based on MUTR and MeViS. Thanks for the authors for their efforts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

📍 Installation

📍 Training

📍 Inference

📖 Citation

🙌 Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

📍 Installation

📍 Training

📍 Inference

📖 Citation

🙌 Acknowledgement