Support Qwen2-VL Fine-Tuning on Video Datasets #5365

hiyouga · 2024-09-04T18:13:58Z

What does this PR do?

This PR adds video training & inference for the Qwen2-VL model. We also supported sequence packing for multimodal datasets in this PR. Some ideas were borrowed from @BUAADreamer in #4136

We observed a bug in the latest transformers, this feature should be usable after this PR is merged:

huggingface/transformers#33307

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

video datasets

8cafc7b

hiyouga temporarily deployed to tests September 4, 2024 18:14 — with GitHub Actions Inactive

tiny fix

c122b9f

hiyouga temporarily deployed to tests September 4, 2024 18:17 — with GitHub Actions Inactive

hiyouga added the solved This problem has been already solved label Sep 4, 2024

hiyouga merged commit 46b1765 into main Sep 4, 2024
1 check passed

hiyouga deleted the video_finetuning branch September 4, 2024 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen2-VL Fine-Tuning on Video Datasets #5365

Support Qwen2-VL Fine-Tuning on Video Datasets #5365

hiyouga commented Sep 4, 2024 •

edited

Loading

Support Qwen2-VL Fine-Tuning on Video Datasets #5365

Support Qwen2-VL Fine-Tuning on Video Datasets #5365

Conversation

hiyouga commented Sep 4, 2024 • edited Loading

What does this PR do?

Before submitting

hiyouga commented Sep 4, 2024 •

edited

Loading