Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Qwen2-VL Fine-Tuning on Video Datasets #5365

Merged
merged 2 commits into from
Sep 4, 2024
Merged

Conversation

hiyouga
Copy link
Owner

@hiyouga hiyouga commented Sep 4, 2024

What does this PR do?

This PR adds video training & inference for the Qwen2-VL model. We also supported sequence packing for multimodal datasets in this PR. Some ideas were borrowed from @BUAADreamer in #4136

We observed a bug in the latest transformers, this feature should be usable after this PR is merged:

huggingface/transformers#33307

Before submitting

@hiyouga hiyouga added the solved This problem has been already solved label Sep 4, 2024
@hiyouga hiyouga merged commit 46b1765 into main Sep 4, 2024
1 check passed
@hiyouga hiyouga deleted the video_finetuning branch September 4, 2024 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant