-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multimodal LLM Finetuning #3450
Conversation
LGTM! Thanks for your contribs! |
Hi @BUAADreamer, thanks for your work. Can you explain how this HF version differs from the original(i mean https://github.com/haotian-liu/LLaVA) during training, and does the HF version trains the mm_projector layers? |
This HFversion is nearly the same as the origin, it is transferred by the official researchers of Huggingface and Haotian Liu. |
Besides, if you want to pre-train like LLaVA, you can refer to #3835 to only fine-tune the mm_proj |
Thx, those really help me a lot. |
What does this PR do?
Add finetuning Multimodal-LLM especially for LLaVA by leveraging AutoModelForVision2Seq and AutoProcessortransformers
This PR is working in progress, needs improvement in the future, e.g. other MLLM.
For more usage, you can refer to MLLM-Finetuning-Demo
Support Models
Make your own Instruct Dataset
Just organize the content like the data/mllm_demo.json.
Finetuning
See examples at
examples/lora_single_gpu/llava1_5_lora_sft.yaml
Before submitting